Page History
...
- an ETL Input Generator - command line program that generates spreadsheet templates and mapping files for the various resource types of the eagle-i resource ontology (e.g. a template/map for antibodies, for instruments, etc.)
- an ETLer - command line program that executes a bulk upload
- a deETLer - command line program that deletes a previous ETL upload
- Bulk workflow - command line program that executes workflow transitions on groups of resources, e.g. Publish, Return to curation, Withdraw
Prerequistes
The SWIFT Toolkit requires:
- A Unix-like environment including a terminal for executing commands
- MacOS and Linux users don't need to install anything extra. For MacOS, use the Terminal app under Applications/Utilities
- cygwin is recommended for Windows users
- A Java 1.7 runtime environment
- execute the command {{ java -- version}} to find out what version you have.
- If necessary, you may download the JRE directly from Oracle and follow the installation instructions.
Download
The SWIFT toolkit is packaged as a zip file, and can be downloaded from our software repository.
...
No Format |
---|
mkdir ~/eagle-i unzip -d ~/eagle-i eagle-i-datatools-swift-2.0MS3.01-dist.zip cd ~/eagle-i/swift-2.0MS3.01 |
Available commands
Input generation
To generate etl templates and maps, navigate to the dedicated directory (above) and run the script:
...
This script will create/use two directories with obvious meanings: ./maps
and ./templates.
ETL
Warning |
---|
The ETLer expects data to be entered into one of the generated templates, and a few conventions to be respected (see Data preparation and ETL Workflow) . A data curator usually makes sure that the template is correctly filled. In particular, the location of the resources to be ETLd (e.g. Lab or Core facility name) must be provided in every row of data. |
- Place your input files (i.e. the completed templates) in a directory of your choice, e.g.
dataDirectory
. All files contained in this directory will be processed by the ETLer. To run an ETL, execute the following command. Note that all resources will be uploaded in the requested workflow state - we recommend to choose CURATION, verify the resources were ETLd correctly, and then publish using the bulk workflow command (see below). If you've already ran a test ETL in a staging environment, choose PUBLISH directly.
No Format ./ETLer.sh -d dataDirectory [-p DRAFT|CURATION|PUBLISH] -c username:password -r repositoryURL
Info If you are practicing the ETL process, you may wish to upload your data to the common eagle-i training node. In this case, if your directory is named dataDirectory, the script would be executed as follows (default workflow state is DRAFT):
No Format ./ETLer.sh -d dataDirectory -c L4:Level4 -r https://training.eagle-i.net
Note that the data that is uploaded to the training node CAN be viewed and modified by others even in a draft state (even if you subsequently lock the records). Note also that the information in the training node is not persistent as the node is refreshed periodically.
- A detailed report of the ETL results is generated in the
./logs
directory; please inspect it to verify that all rows were correctly uploaded. The RDF version of generated resources is also logged in this directory. - To further verify the data upload, log on to the SWEET application and select the lab to which the ETLd resources belong.
De-ETL
Resources that are uploaded to an eagle-i repository via ETL are tagged with the name of the file from which they were extracted. It is therefore relatively simple to de-ETL an entire file. To do so, execute the following command:
No Format |
---|
./deETLer -f filename -c username:password -r repositoryURL |
Bulk Workflow
Execute the following command to perform workflow actions (e.g. send to curation, publish, unpublish) on all resources ETLd from a particular file (i.e. resources that are tagged with the filename in the eagle-i repository):
...