Page History

Table of Contents

Reference Docs

Info
User Guide Human annotation with Protege Machine Annotations and Data Dictionary

Image Added

Office Excel

name	scrubber_classification_table.xls

Info
3.X is a new vision for the scrubber. As we approached diminishing returns for improving REGEX and whitelists/black lists, we have shifted towards a machine learning methods approach and learning from large bodies of medical information from publications and UMLS dictionaries.

Table of Contents

Precondition:

Steps:

Block of text is sent to cTakes
cTakes processing
1. start & end position of all POS tags
2. part of speech
  1. Most interested in Nouns because of PHI
3. Need Info: are cUIDS associated with WORDS or PHRASES?

Post-condition:

Input document (either medical note OR publication) will have POS tagged and UMLS CUIDs.

Precondition:

Steps:

Each "scrubber" implementation procudes Recorder output
1. Passthrough Imp
  1. Regex
  2. Word lists
2. cTakes Impl (OpenNLP)
  1. Noun Phrases
  2. UMLS cuids
Performance evaluation (ROC)
1. Scrubber standalone
2. Scrubber word lists limited by detected noun phrases
3. Scrubber word lists limited by detected noun phrases and non-UMLS concepts

Post-Condition

...