As mentioned in the SWIFT guide, the ETL command requires that all data be entered into SWIFT templates - one template per resource type. Typically, a "main" resource type in eagle-i (e.g. Plasmid) will require several ancillary resource types (e.g. Person, Journal Article, Construct insert, ...).

Entering data correctly is key for the ETL process to succeed without creating duplicates or non-conforming data. Data may be entered manually into a SWIFT template. However, the more common scenario is that of uploading data that already exists electronically.


Understanding the original data and pre-processing it

Data that exists electronically will typically be stored in a relational database and accessible via a database dump, or accessible through an API (e.g. in JSON format). It is usually necessary to perform a few transformations on the original data in order to fit it into a SWIFT template. This step is highly dependent on the nature of the original data, and hence will need to be developed on a case by case basis. In mapping the data from its original schema to the eagle-i ontology, the following scenarios may be encountered:

We usually write ad-hoc scripts that perform such transformations on the original data before copy-pasting individual columns into a SWIFT template. For controlled vocabulary fields, we produce mapping tables with the help of our domain experts and use them during these pre-processing steps.

Templates

SWIFT templates need to be generated using the toolkit version that corresponds to your eagle-i repository version (this is very important -if the versions don't match, you might end up with ontology terms that are not found during the ETL process).

When ETLing a "main" type, there are usually resources of other types that are related to it (e.g People, Organizations, Publications). It is necessary to enter information for these related types in a template of their own. For example, when ETLing a Monoclonal Antibody, you'll have separate files for related Hybridoma Cell Lines, People and Publications. 

Generate the following templates:

Guidelines for filling the templates

Templates include different kinds of columns:

Consult the tooltips in the header rows for more information about what is expected.

A few more guidelines and tips:

The actual ETL process

ETL first the secondary resource type files (i.e. linked resources), then the main resource type file.

Special rules for embedded instances