Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Info

The toolkit described herein is currently not user-friendly (though it works well – we use it routinely to bulk-upload data). If you encounter issues, please do not hesitate to contact us.

SWIFT (Semantic Web Ingest from Tables) is a toolkit that allows experienced users to bulk-upload data into an eagle-i repository, via ETL (Extract, Transform and Load). The figure below is a high level depiction of the ETL process. The toolkit supports Excel spreadsheets and csv files as input (though both need to conform to a SWIFT template, see below).

Gliffy Diagram
nameetl-high-level
 
pagePin3


This guide provides an overview of tasks pertaining to ETL and the usage of the SWIFT toolkit. The ETL workflow requires a person with domain knowledge and understanding of the eagle-i resource ontology to prepare the input files for optimal upload, and a person with basic knowledge of Unix to run the commands and troubleshoot potential errors. A detailed description is provided in the page Preparing Input Data and Running an ETL Process

...

Warning

The ETLer expects data to be entered into one of the generated templates, and a few conventions to be respected (see Preparing Input Data and Running an ETL Process) . A data curator usually makes sure that the template is correctly filled. In particular, the location of the resources to be ETLd (e.g. Lab or Core facility name) must be provided in every row of data.

...


  • A detailed report of the ETL results is generated in the ~/eagle-i/swift-2.0MS3.01/logs directory; please inspect it to verify that all rows were correctly uploaded. The RDF version of generated resources is also logged in this directory.
  • To further verify the data upload, log on to the SWEET application and select the lab to which the ETLd resources belong.

...

  • This command will not attempt to determine if matching resources exist already in the eagle-i repository; it is therefore not idempotent - if it is run two times with the same data file, duplicate resources will be created.
  • The value of the -p (promote) parameter indicates the desired workflow state for all resources - we recommend to choose CURATION, verify the resources were ETLd correctly, and then publish using the bulk workflow command (see below). If you've already ran a test ETL in a staging environment, choose PUBLISH directly.

  • To avoid classpath confusion, please use the fully qualified path for the dataDirectory.
  • Make sure to use the full path of your directory,, eg /Users/juliane/swift-4.3.0/mcow_ipsc/test
  • Make sure you are in your swift directory in your terminal when you execute the command.

ETL to replace existing or create new resources.

...

Code Block
languagebash
 ./deETLer.sh -f filename -c username:password -r repositoryURL

...