As mentioned in the SWIFT guide, the ETL command requires that all data be entered into SWIFT templates - one template per resource type. Typically, a "main" resource type in eagle-i (e.g. Plasmid) will require several ancillary resource types (e.g. Person, Journal Article, Construct insert, ...).

Entering data correctly is key for the ETL process to succeed without creating duplicates or non-conforming data. Data may be entered manually into a SWIFT template. However, the more common scenario is that of uploading data that already exists electronically.


Understanding the original data and pre-processing it

Data that exists electronically will typically be stored in a relational database and accessible via a database dump, or accessible through an API (e.g. in JSON format). It is usually necessary to perform a few transformations on the original data in order to fit it into a SWIFT template. This step is highly dependent on the nature of the original data, and hence will need to be developed on a case by case basis. In mapping the data from its original schema to the eagle-i ontology, the following scenarios may be encountered:

We usually write ad-hoc scripts that perform such transformations on the original data before copy-pasting individual columns into a SWIFT template. For controlled vocabulary fields, we produce mapping tables with the help of our domain experts and use them during these pre-processing steps.

 

Knowing what templates are needed

SWIFT templates need to be generated using the toolkit version that corresponds to your eagle-i repository version (this is very important -if the versions don't match, you might end up with ontology terms that are not found during the ETL process). Generate the following templates:

Guidelines for filling the templates

Templates include different kinds of columns:

Consult the tooltips in the header rows for more information about what is expected.

 Most columns (except for the "main" resource name and type) accept multiple values. Use a semicolon to separate the values.

The actual ETL process

ETL first the linked resources, then the main resource

Special rules for embedded instances