You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 5 Next »

Table Name

Used for Scrubbing

Used for Publication Analysis

Feature_matrix_test

YES

 

Feature_matrix_train

YES

 

Human_annotations_test

YES

 

Human_annotations_train

YES

 

Machine_annotations_test

YES

 

Machine_annotations_train

YES

 

Lookup_dictionary

YES

 

Lookup_term_frequency

YES

 

Lookup_umls

YES

 

Pubs_authors

 

YES

Pubs_keywords

 

YES

Pubs

 

YES

Pubs_refs

 

YES

Feature_matrix_*

Stores feature matrix that is built from the Machine_annotations_* and Human_annotations_*  tables. This is the rolled up feature set used for classification.

Human_annotations_*

Stores all annotations created by humans as part of a manual annotations effort.

Machine_annotations_*

Stores all annotations created by the UIMA pipeline.

Lookup_dictionary

Contains names from the 1990 US census that are used in

Lookup_term_frequency

Contains term frequency calculated across a random selection of 10,000 open access medical publications. Raw open access publications are available for free through NIH/NLM.

Lookup_umls

Contains terms from UMLS subset that was used Scrubber.

This DOES NOT include the UMLS CUIDs due to licensing restrictions.

  • No labels