Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

As mentioned in the SWIFT guide, the The ETL command requires that all data be entered into SWIFT templates - one template per resource type. Typically, a "main" resource type in eagle-i (e.g. Plasmid) will require several ancillary secondary resource types (e.g. Person, Journal Article, Construct insert, ...).

Entering data correctly is key for the ETL process to succeed without creating duplicates or non-conforming data. Data may be entered manually into  To add data to a SWIFT template. However, the more common scenario is that of uploading data that already exists electronically, you may add row by row, or use data that exists electronically, pre-process it (see below) and copy individual columns into the SWIFT template.


Understanding the original data and pre-processing it

...

  • You must always enter the Organization to which the resource is associated, either by name or by URI
    • It is best to use the Sweet to add the Organization (e.g. lab) to which these resources are associated, and then reference this name or URI in the files.
  • If there is more than one value for a given column, enter values separated by ; (semicolon). Conversely, check your input file for the presence of ; in values that are not meant to be split and substitute for a different character.
  • The first two columns (hidden) of a template are reserved for metadata. Please do not modify them or the name of the Tabs.
  • Every resource (primary or secondary) needs to have a name and a type as a minimum. For simplification, the type column is omitted if there are no possible subclasses (e.g. Person, Human Subject). If the template has a type column, you must enter a value.

The actual ETL process

Name the files meaningfully (expand)

ETL first the secondary resource type files (i.e. linked resources), then the main resource type file.

...