Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • Determine the type of ETL command that you need to use (create all new or replace/create). ETL commands are described in the SWIFT toolkit guide. 
  • The

    Note that the

    ETL process may need to run for a long time (depending on the number of rows in your input, the number of properties filled per row and the network speed) so plan accordingly - e.g. leave it running over night.

    Tip
    It is good practice to run a test ETL with a subset of rows to detect possible errors in the template data. Run it against a staging server if you have access to one or against the eagle-i training node. If neither of these options is desirable, use your live eagle-i node but ETL the data in the DRAFT state and de-ETL it afterwards.
  • We recommend that the ETL process be run using dedicated credentials (e.g. create an "automated curator" user in your eagle-i repository), such that it is later possible to easily isolate resources that were bulk loaded. ETL needs a Level 4 role.
  • ETL first the secondary resource type files (i.e. linked resources), then the primary resource type file.

    Tip
    We find it useful to name the data subdirectories according to the order in which they need to be ETLd, e.g. for iPS Cells:
    1-human-subject
    2-diagnosis
    3-biological-specimen
    4-primary-cell-line
    5-induced-pluripotent-stem-cell-line
  • We recommend to ETL resources into the CURATION state, verify the resources were ETLd correctly, and then publish them using the bulk workflow command. The PUBLISH state may be directly used if you are satisfied with your testing.

Post ETL tasks

  • Examine the logs generated by the ETL command for possible errors. A log file will be generated per ETLd file, and the row where the error occurred will be indicated. You may need to re-ETL these rows.
  • If an ontology term is not resolved during the ETL process, a triple will be added with the predicate http://eagle-i.org/ont/datatools/1.0/temp_term_not_found and the literal value found in the data file.
    • issue the following SPARQL query against your repository to find these instances and correct them via the SWEET:

      Code Block
      languagetext
       select * where {?s <http://eagle-i.org/ont/datatools/1.0/temp_term_not_found> ?o}

...