Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

We usually write ad-hoc scripts that perform such transformations on the original data before copy-pasting individual columns into a SWIFT template. For controlled vocabulary fields, we produce mapping tables with the help of our domain experts and use them during these pre-processing steps.

...

Generating the templates

SWIFT templates and maps need to be generated using the toolkit version that corresponds to your eagle-i repository version (this is very important because ontology versions matter - if versions don't match, you might end up with non-conforming or incomplete resource descriptions).

When ETLing a primary type, there are usually resources of other types that are related to it (e.g People, Organizations, Publications). It is necessary to enter information for these related types in a separate template of their own. For example, when ETLing a Monoclonal Antibody, you'll have separate files for related Hybridoma Cell Lines, People and Publications. 

Generate Use the generate-inputs command described in the SWIFT toolkit guide to generate the following templates:

  • A template for your primary resource type - use the most specific type possible (e.g. Monoclonal Antibody and not Reagent)
  • Templates for all the secondary resources you need - consult the eagle-i ontology browser page of your primary resource type to understand what types can be linked

Do not modify the directories that are created by this command.

Adding your data to SWIFT templates

Create a directory dedicated to your data files, and subdirectories for each resource type. Copy the templates you will use to the appropriate subdirectory - note that more than one file of a given resource type may be used (e.g. you could have a file per lab if you're ETLing multiple labs) We recommend that before  adding adding resource data to a template, you rename the file such that the name is meaningful, since all resources ETLd from a file will be tagged with that file's name, We've found it useful to use a name that reflects the date of the ETL , e.g. 20150627-NYSCF-human-study.xls.

...