You are viewing an old version of this page. View the current version.

Compare with Current View Page History

Version 1 Next »

As mentioned in the SWIFT guide, the ETL command requires that all data be entered into SWIFT templates - one template per resource type. Typically, a "main" resource type in eagle-i will require several ancillary resource types.

Entering data correctly is key for the ETL process to succeed without creating duplicates or non-conforming data. Data may be entered manually into a SWIFT template. However, the more common scenario is that of uploading data that already exists electronically.


Understanding the original data and pre-processing it

Data that exists electronically will typically be stored in a relational database and accessible via a database dump, or accessible through an API (e.g. in JSON format). It is usually necessary to perform a few transformations on the original data in order to fit it into a SWIFT template. This step is highly dependent on the nature of the original data, and hence will need to be developed on a case by case basis. In mapping the data from its original schema to the eagle-i ontology, the following scenarios may be encountered:

  • Original data may need to be split into multiple eagle-i resource types
  • For a given type, there may be a one-to-one correspondence between the field (e.g. column) and a property in the eagle-i ontology
  • For a given type, a field in the original data may need to be split into multiple eagle-i ontology properties
  • For a given type, fields in the original data may need to be combined into one eagle-i ontology property
  • A controlled vocabulary field in the original data may need to be mapped to a term in one of eagle-i's referenced taxonomies

We usually write ad-hoc scripts that perform such transformations on the original data before copy-pasting individual columns into an eagle-i template. For controlled vocabulary fields, we produce mapping tables with the help of our domain experts.

 

Knowing what templates are needed

Rules for filling the templates

The actual ETL process

 

 

 

 

 

 

  • No labels