Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Embedded resources are instances of Embedded Classes (see the eagle-i ontology browser). Embedded resources only make sense as part of a another "containingmain" resource and as such they can only be created or updated as part of a creation or update operation for that resource. Examples of embedded classes are Construct Inserts (embedded in Constructs) or Diagnosis (embedded in Human Subjects).

In the context of ETL and SWIFT templates, embedded resources are a special case of referenced linked resources. If there is only one embedded resource in a given resource, it is possible to enter its information directly in the SWIFT template row for the containing main resource. If there are more than one embedded resources in a given resource, the following procedure must be used:

  • In the template for the containing main resource, enter a list of semicolon-separated labels for the embedded resources, and fill the type column if it exists (only one value is required, not a semicolon-separated list). This will result in the creation of "empty" embedded resources
  • ETL this file (see below)
  • In order to ETL the rest of the properties for the embedded resourceresources:
    • Generate a template/map for the embedded resource class
      • Templates generated for embedded classes will have two additional columns: Main Resource Name and Main Resource Type
    • Fill the template with the information for the embedded resource (one row per embedded resource), making sure the entries for Main Resource Name and Main Resource Type match the label  and and type previously used when ETLing  the the main resource.
    • Run the ETL command with the embedded resource file as input. Note that the -p and -eid parameters will be ignored if they are present.

...

This Google Drive folder contains a few annotated examples of SWIFT templates with data.

Running the actual ETL

  • Determine the type of ETL command that you need to use (create all new or replace/create). ETL commands are described in the SWIFT toolkit guide. Note that the ETL process may need to run for a long time (depending on the number of rows in your input, the number of properties filled per row and the network speed) so plan accordingly - e.g. leave it running over night.
    • It is good practice to run a test ETL with a subset of rows to detect possible errors in the template data. Run it against a staging server if you have access to one or against the eagle-i training node. If neither of these options is desirable, use your live eagle-i node but ETL the data in the DRAFT state and de-ETL

...

    • it afterwards.
  • We recommend that Run the ETL process be run using dedicated credentials (e.g. create an "automated curator" user in your eagle-i repository), such that it is later possible to easily isolate resources that were bulk loaded.
    • ETL needs a Level 4 role
  • ETL first the secondary resource type files (i.e. linked resources), then the primary resource type file.

Post ETL tasks

  • Examine the logs generated by the ETL command for possible errors. A log file will be generated per ETLd file, and the row where the error occurred will be indicated. You may need to re-ETL these rows.
  • If an ontology term is not resolved during the ETL process, a triple will be added with the predicate http://eagle-i.org/ont/datatools/1.0/temp_term_not_found and the literal value found in the data file.
    • issue the following SPARQL query against your repository to find these instances and correct them via the SWEET:

      Code Block
      languagetext
       select * where {?s <http://eagle-i.org/ont/datatools/1.0/temp_term_not_found> ?o}