Page History

Intended usages

Info
We recommend starting with the default configuration and the prebuilt train/test models. scrubber.properties configures your scrubber in one location. Apache UIMA, Apache cTAKES, and WEKA packages are loaded dynamically.

...

Default configuration

...

Developers can use Scrubber 3.X in "default mode" with the same settings as the provided train and test model files. Input and output settings are managed in scrubber.properties (file paths, database settings, method implementations).

...

Customize NLP pipeline

...

Scrubber uses Apache UIMA and Apache cTAKES packages, which together provide the NLP pipeline for lexical parsing and medical concept annotation. Generated feature sets are exported to the SQL database or model file (CSV, ARFF). The UIMA and cTAKES services used by Scrubber are defined and configured using scrubber.properties.

...

Customize Classifier

...

Scrubber can use different classifier implementations without recompiling the software.
By default scrubber dynamically loads the popular WEKA C4.5 decision tree classifier with multi-class support.

Features

Annotate word tokens and redact PHI from physician notes
cTAKES lexical parsing and medical dictionary annotation
WEKA multi-class decision tree classifier (plugin default)
Protege UI support for human expert curators (reads output)
Generate feature set of lexical properties, medical concept codes, and human defined rules
Compare lexical properties of public and private text sources
Distinguish private patient data from coded medical concepts and commonly used words
Prebuilt train and test models can be imported to Weka (default), Matlab, or R
(default) Test your local physician notes without retraining
(optional) retrain model using local physician note samples, publications, and medical dictionaries
Prerequisites

Note
MySQL 5.5+ Linux 1GB+ memory recommended

Prerequisites

Code Block
* MySQL 5.5+ * Linux * 1GB+ memory recommended

Office Word

name	scrubber-3.x-runtime-guide.doc

Child pages

Versions Compared

Old Version 7

New Version 8

Key

Intended usages

Default configuration

Customize NLP pipeline

Customize Classifier

Features

Prerequisites

Prerequisites