Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Added information about maps

...

This script will create/use two directories with obvious meanings: ./maps and ./templates. Do not modify them.

ETL

Warning

The ETLer expects data to be entered into one of the generated templates, and a few conventions to be respected (see Preparing Input Data and Running an ETL Process) . A data curator usually makes sure that the template is correctly filled. In particular, the location of the resources to be ETLd (e.g. Lab or Core facility name) must be provided in every row of data.

...

Place your data files (i.e. the completed templates) in a directory of your choice, e.g. dataDirectory. All files contained in this directory will be processed by the ETLer.

...

 

  • A detailed report of the ETL results is generated in the ~/eagle-i/swift-2.0MS3.01/logs directory; please inspect it to verify that all rows were correctly uploaded. The RDF version of generated resources is also logged in this directory.
  • To further verify the data upload, log on to the SWEET application and select the lab to which the ETLd resources belong.

Assumptions

  • Templates that were generated by generate-inputs.sh are completed and in a directory, e.g. dataDirectory.

    Note

    All files in the dataDirectory will be processed by the ETLer. Please be sure all secondary resource templates are in their own directories.

  • Maps (*.rdf) that were generated by generate-inputs.sh have been copied to the SWIFT executable directory's maps folder, e.g. ~/eagle-i/swift-2.0MS3.01/maps.

ETL to create new resources

Code Block
languagebash
titleETL command for creating new resources
./ETLer.sh -d dataDirectory [-p DRAFT|CURATION|PUBLISH] -c username:password -r repositoryURL
  • This command will not attempt to determine if matching resources exist already in the eagle-i repository; it is therefore not not idempotent - if it is run two times with the same data file, duplicate resources will be created.
  • The value of

    the

    the -p (promote) parameter indicates the desired workflow state

    for

    for all

    resources

     resources - we recommend to choose CURATION, verify the resources were ETLd correctly, and then publish using the bulk workflow command (see below). If you've already ran a test ETL in a staging environment, choose PUBLISH directly.

...

  • To avoid classpath confusion, please use the fully qualified path for the dataDirectory.

ETL to replace existing or create new resources.

Code Block
languagebash
titleETL command for replacing existing resources or creating new resources
./ETLer.sh -d dataDirectory [-p DRAFT|CURATION|PUBLISH] -c username:password -r repositoryURL -eid property-uri
  • Use this command if  the input file represents resources that have been previously uploaded or created in eagle-i

  • The value of the -eid parameter (external identifier) is the URI of a property that uniquely identifies the resource outside eagle-i. This property will be used to match the input to a resource in the eagle-i repository. Grab the property URI from the eagle-i ontology browser (expand the property name to see all information about a property). Example properties are: 

    • Catalog number, -eid http://purl.obolibrary.org/obo/ERO_0001528
    • Inventory number, -eid http://purl.obolibrary.org/obo/ERO_0000044
    • RDFS label, use the shorthand syntax -eid label
  • If the ETL process finds a matching resource, it will replace all its properties with the values from the input file; the URI of the matched resource will be preserved.
  • If the ETL process does not find a matching resource, a new resource will be created.
  • The value of the -p (promote) parameter indicates the desired workflow state for newly created resources. Existing resources will retain their workflow state.
  • To avoid classpath confusion, please use the fully qualified path for the dataDirectory.

Practicing ETL

If you are practicing the ETL process, you may wish to upload your data to the common eagle-i training node. For example, if your directory is named dataDirectory and you wish to practice creating new resources, the script would be executed as follows (default workflow state is DRAFT):

...