Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Overview

3.X is a new vision for the scrubber.
As we approach "diminishing returns" of improving the REGEX and dictionaries approach, we are now moving towards statistical methods and learning from large bodies of medical information from publications and UMLS dictionaries.

3.X Diagram In Progress Image Modified

Use Case: Tagging Noun Phrases and UMLS concepts

Precondition:

  • Training Data: Genia, PenTree Bank, Mayo Source
  • Software: cTakes using features POS tagger & UMLS CUID extractor

...

  • Input document (either medical note OR publication) will have POS tagged and UMLS CUIDs.

Use Case: Meta-analysis of text

Precondition:

  • Tagging Noun Phrases
  • Scubber configured (with or without local dictionary/regex mods)

...

  • Text is processed by more than one algorithm "ham vs spam"

Proof Of Principle : Demo

Office Excel
namescrubber_classification_table.xls