Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

3.X is a new vision for the scrubber.
As we approach approached "diminishing returns" of for improving the REGEX and dictionaries approachwhitelists/black lists, we are now moving have shifted towards a statistical methods approach and learning from large bodies of medical information from publications and UMLS dictionaries.

...

Venn Diagram!scrubber-3.0-venn.jpg|border=1!

Use Case: Tagging Noun Phrases and UMLS concepts

Precondition:

  • Training Data: Genia, PenTree Bank, Mayo Source
  • Software: cTakes using features POS tagger & UMLS CUID extractor

...

  1. Block of text is sent to cTakes
  2. cTakes processing
    1. start & end position of all POS tags
    2. part of speech
      1. Most interested in Nouns because of PHI
    3. Need Info: are cUIDS associated with WORDS or PHRASES?

Post-condition:

  • Input document (either medical note OR publication) will have POS tagged and UMLS medical concept CUIDs.

Use Case: Meta-analysis of text

...

  • Text is processed by more than one algorithm "ham vs spam"

...

Example

Office Excel
namescrubber_classification_table.xls