Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Code Block
McMurry* AJ, Fitch* B, Savova G, Kohane IS, Reis BY. “Improved de-identification of physician notes through integrative modeling of both identifying and non-identifying medical text”, BMC Medical Informatics and Decision Making Accepted minor revise Jan 2013.

...



Venn Diagram

Image Added

Example text block -> Feature Set

Office Excel
namescrubber_classification_table.xls
Info

3.X is a new vision for the scrubber. As we approached diminishing returns for improving REGEX and whitelists/black lists, we have shifted towards a machine learning methods approach and learning from large bodies of medical information from publications and UMLS dictionaries

...

Venn Diagram

Image Removed

System Use Cases

...

Use Case: Tagging Noun Phrases and UMLS concepts

...

Precondition:

  • Training Data
  • Software: cTakes using features speech tagger & UMLS CUID extractor

Steps:

  1. Block of text is sent to cTakes
  2. cTakes processing
    1. start & end position of all POS tags
    2. part of speech
      1. Most interested in Nouns because of PHI

Post-condition:

...

Use Case: Meta-analysis of text

...

Precondition:

  • Tagging Noun Phrases
  • Scubber configured (with or without local dictionary/regex mods)

Steps:

  1. Each "scrubber" implementation procudes Recorder output
    1. Passthrough Imp
      1. Regex
      2. Word lists
    2. cTakes Impl (OpenNLP)
      1. Noun Phrases
      2. UMLS cuids
  2. Performance evaluation (ROC)
    1. Scrubber standalone
    2. Scrubber word lists limited by detected noun phrases
    3. Scrubber word lists limited by detected noun phrases and non-UMLS concepts

Post-Condition

  • Text is processed by more than one algorithm "ham vs spam"

Example text block -> Feature Set

...

.

...