Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

As mentioned in the SWIFT guide, the ETL command requires that all data be entered into SWIFT templates - one template per resource type. Typically, a "main" resource type in eagle-i (e.g. Plasmid) will require several ancillary resource types (e.g. Person, Journal Article, Construct insert, ...).

Entering data correctly is key for the ETL process to succeed without creating duplicates or non-conforming data. Data may be entered manually into a SWIFT template. However, the more common scenario is that of uploading data that already exists electronically.

...

We usually write ad-hoc scripts that perform such transformations on the original data before copy-pasting individual columns into an eagle-i a SWIFT template. For controlled vocabulary fields, we produce mapping tables with the help of our domain experts and use them during these pre-processing steps.

 

Knowing what templates are needed

...

SWIFT templates need to be generated using the toolkit version that corresponds to your eagle-i repository version (this is very important -if the versions don't match, you might end up with ontology terms that are not found during the ETL process). Generate the following templates:

  • A template for your "main" resource - use the most specific type possible (e.g. Plasmid and not Reagent)
  • Templates for linked resources you need - consult the eagle-i ontology browser page of your main resource type to understand what types can be linked

Guidelines for filling the templates

Templates include different kinds of columns:

  • Plain text columns - you may enter any text. Avoid semi-colons, as they are used as the field separator (more on this below)
  • Resource columns - represent linked resources. A value that matches the label of a resource in the repository (or a new resource) is expected. If you know the resource's URI, you may enter it here and thus avoid a look-up operation during ETL
    • Resource columns are followed by Resource type columns; sometimes the resource type column is omitted if there are no possible values, i.e. the resource type ahs no subclasses)
  • Referenced taxonomies - a value that matches an eagle-i ontology term or synonym is expected

Consult the tooltips in the header rows for more information about what is expected.

 Most columns (except for the "main" resource name and type) accept multiple values. Use a semicolon to separate the values.

The actual ETL process

ETL first the linked resources, then the main resource

Special rules for embedded instances

Rules for filling the templates

The actual ETL process