SHRINE 4.2.0 Chapter 10.3 - Generate SHRINE Lucene Search and Autosuggest Indices

These instructions will show how to generate Lucene search and auto suggest indices from the tables in the i2b2 i2b2metadata database. Typically one admin - usually the hub admin - with take responsibility for generating the indices and share them with all other nodes in the network.

Download the Lucene indexing tool which is distributed as a executable jar file from here:

https://repo.open.catalyst.harvard.edu/nexus/content/groups/public/net/shrine/shrine-lucene-indexer/4.1.0/shrine-lucene-indexer-4.1.0-jar-with-dependencies.jar

Create the input files:

Export the data in the TABLE_ACCESS table and also the data from all the tables listed in the c_table_name column of the TABLE_ACCESS table. Any delimiter can used in the exported files but the default used in the indexing tool is '|'. There should be a header line in the exported data files. A " (quotation mark) should be used as the the escape character. Place the files for each exported table into a directory.
1. The exported TABLE_ACCESS is required to have the following header columns:
  1. c_table_cd
  2. c_table_name
  3. c_hlevel
  4. c_fullname
  5. c_name
  6. c_synonym_cd
  7. c_visualattributes
  8. c_basecode
  9. c_metadataxml
  10. c_tooltip

1. The other exported ontology tables are required to have the following header columns:
  1. c_hlevel
  2. c_fullname
  3. c_name
  4. c_synonym_cd
  5. c_visualattributes
  6. c_basecode
  7. c_metadataxml
  8. c_tooltip
  9. m_applied_path
Create a category definition file that defines the categories for the concepts in the ontology. The category definition file containing in the following order the concept path (i.e. a concatenation of the C_TABLE_CD and C_FULLNAME columns from TABLE_ACCESS), code category(optional), concept category, and the code set(optional). The separator for this file can be specified with the -r flag but by default the separator is a comma. The code category is the category that the concept is displayed under the Medical Concept List. The concept category is the label displayed next to the concept in the query definition panel. The code set is the category that the concept appears in when filtering search results. There should be no header line in the category definition file.
1. Here is an example category definition file.
  Category Definition File
```
\\i2b2_DEMO\i2b2\Demographics,Demographics,Demographic
\\i2b2_DIAG\i2b2\Diagnoses,Diagnoses,Diagnosis,Diagnoses ICD9
\\i2b2_EXPR\i2b2\Expression Profiles Data,,Expression Profiles Data     
\\i2b2_LABS\i2b2\Labtests,Labs,Laboratory Test  
\\i2b2_MEDS\i2b2\Medications,Medications,Medication     
\\i2b2_PROC\i2b2\Procedures,Procedures,Procedure        
\\i2b2_PROV\i2b2\Providers,Providers,Provider   
\\i2b2_REP\i2b2\Reports,Reports,Report  
\\i2b2_VISIT\i2b2\Visit Details,Visit Details,Visit Detail      
\\ICD10_ICD9\Diagnoses,Diagnoses,Diagnosis,Diagnoses ICD10
\\PHI\PHI,PHI,Protected Health Information      
\\BIRN\BIRN,BIRN,BIRN   
\\Custom Metadata\Custom Metadata,Custom Metadata
```

Create the Lucene search index :

java -jar shrine-ontology-lucene-indexer-4.1.0-jar-with-dependencies.jar  -o <insert exported data directory> -c <insert category definition file>

Create the Lucene autosuggest index :

java -jar shrine-ontology-lucene-indexer-4.1.0-jar-with-dependencies.jar  -a -o <insert exported data directory> -c <insert category definition file>

Additional information

Specify a different delimiter for the exported ontology files. In the following example a tab delimiter is specified instead:

java -jar shrine-ontology-lucene-indexer-4.1.0-jar-with-dependencies.jar  -p '\t' -o <insert exported data directory> -c <insert category definition file>

List all available command line options :

java -jar shrine-ontology-lucene-indexer-4.1.0-jar-with-dependencies.jar

Page tree