...
3.X is a new vision for the scrubber.
As we approach approached "diminishing returns" of for improving the REGEX and dictionaries approachwhitelists/black lists, we are now moving have shifted towards a statistical methods approach and learning from large bodies of medical information from publications and UMLS dictionaries.
...
Venn Diagram!scrubber-3.0-venn.jpg|border=1!
Use Case: Tagging Noun Phrases and UMLS concepts
Precondition:
- Training Data: Genia, PenTree Bank, Mayo Source
- Software: cTakes using features POS tagger & UMLS CUID extractor
...
- Block of text is sent to cTakes
- cTakes processing
- start & end position of all POS tags
- part of speech
- Most interested in Nouns because of PHI
- Need Info: are cUIDS associated with WORDS or PHRASES?
Post-condition:
- Input document (either medical note OR publication) will have POS tagged and UMLS medical concept CUIDs.
Use Case: Meta-analysis of text
...
- Text is processed by more than one algorithm "ham vs spam"
...
Example
Office Excel | ||
---|---|---|
|