Info |
---|
Note : Scrubber 3.X is being ported to Apache cTAKES, this is an interim BETA release. |
Table of Contents |
---|
1. Intended usages
1.1 Default configuration
Info |
---|
We recommend starting with the default properties and the prebuilt train/test models. |
Info |
---|
scrubber.properties : all supported config options and features in one place. Apache UIMA, Apache cTAKES, and WEKA distribution jars are loaded dynamically. |
...
Default configuration
...
1.2 Customize NLP pipeline
- Scrubber uses Apache UIMA and Apache cTAKES packages, which together provide the NLP pipeline for lexical parsing and medical concept annotation. Generated feature sets are exported to the SQL database or model file (CSV, ARFF). The UIMA and cTAKES services used by Scrubber are defined and configured using scrubber.properties.
1.3 Customize Classifier
- Scrubber can use different classifier implementations without recompiling the software.
- By default scrubber dynamically loads the popular WEKA C4.5 decision tree classifier with multi-class support.
2. Software Features
2.1 Annotation
- Annotate word tokens and redact PHI from physician notes
- cTAKES lexical parsing and medical dictionary annotation
- WEKA multi-class decision tree classifier (plugin default)
- Protege UI support for human expert curators (reads output)
- Generate feature set of sets containing lexical properties, medical concept codes, and human defined rules
- Compare lexical properties of public and private text sources
- Distinguish private patient data from coded medical concepts and commonly used words
- rules
2.2 Models
- Prebuilt train and test models can be imported to Weka (default), Matlab, or R
- (default) Test your local physician notes without retraining
- (optional) retrain Retrain model using local physician note samples, publications, and medical dictionaries.
2.3 Classification
How-To Guide
...
Install
...
Train
...
Test
...
Scrub
...
Configure
...
- Distinguish (classify) private patient data from coded medical concepts and commonly used words
2.4 Compare Text
- Compare lexical properties and distributions of public and private text sources
3. How To
3.X Install / Train / Test / Scrub
Office Word | ||
---|---|---|
|
Scrubber Property KEY = VALUE
Anchor | ||||
---|---|---|---|---|
|
4. scrubber.properties
4.1 Java Object
- ScrubberProperties.java statically binds scrubber.properties at startupScrubberPropertiesTest.java demonstrates binding scrubber.properties to shell commands.startup
4.2 Java Template
- TemplateFileProcessor.java IO and token replacement of default configuration files
4.3 Shell scripts
- setClassPath.sh sets the java classpath and exports the shell variablesshell variables
4.4 Shell UnitTest
- ScrubberPropertiesTest.java demonstrates binding scrubber.properties to shell commands.