Using a peer-to-peer architecture, institutions become SPIN members (nodes) by securing institutional review board (IRB) approvals and deploying the SPIN software. At any time, an institution can withdraw from the network without leaving their data behind or disabling the network. SPIN nodes can serve as peers or supernodes to query local databases or networks of child nodes, respectively.

SPIN allows institutions to expose de-identified pathology reports while keeping corresponding reports containing Protected Health Information (PHI) disconnected from the Internet. A randomly generated unique identifier is assigned to both the PHI and de-identified reports in a locally controlled codebook. The machine storing the codebook is disconnected from the Internet and protected according to each participating site's policies. The resulting solution is flexible and compliant with HIPAA regulations.

SPIN provides three levels of increasing access commensurate with investigator credentials and IRB approvals. First, feasibility studies are conducted using a statistical level query that returns only aggregated results. Second, individual de-identified cases are selected by investigators certified by one of the participating institutions. The third level allows requests for specimens and clinical data that must be approved by the institution storing the requested data.

Joining the Network

Our Extract, Transform, Load (ETL) toolkit requires the minimal amount of programming and expertise necessary to get new types of health data flowing in large quantities. Because health data standards are critically lacking adoption, SPIN particularly favors small, simplified schemas that can be used to support BIG investigations with robust sample sizes. Under the control of each SPIN peer, the ETL toolkit also provides a set of anonymization and autocoding tools to prepare medical free text into standard vocabularies that are meaningful for research and public health investigations. Clearly, this approach favors early adoption and the "low hanging fruit" -- the first steps towards timely realization of a National Health Information Network. We have also published a set of decentralized IRB agreements for institutions as a guide for research centers such as those created by the NCI or CTSA.

Reusing agreements and systems already in place

Technology alone will not solve organizational issues or policy variance. We have modeled SPIN to fully recognize the care provider relationships, investigator data use agreements, and legal privacy definitions which regulate how freely data can flow between institutions. To leverage the existing infrastructure already in place, we have created a "Peer Group" based model wherein individual institutions choose who to exchange data with and what types of data to share. For example, a single hospital may authorize neighboring hospitals to share De-Identified cancer data for research. Other hospitals may collaborate for public health surveillance. As we deploy these systems across Harvard Medical School and elsewhere, we increasingly find that once the agreements are in place, multiple health applications can concurrently reside on the same SPIN network.

Security = Anonymization & Authentication & Authorization

As we describe in our latest publications, this infrastructure must reassert hospitals as stewards of patient privacy and potential research benefit. Clearly, patient anonymization must occur locally at each participating peer institution. Authentication is the agreement between an investigator and a peer who cryptographically agree on a federated identity service with digital signatures. Authorization is the access level set by each hospital for each type of investigation. Together, these security principles reflect the strong local control over privacy that engenders regional participation. This approach has enabled us to deploy new SPIN applications in relatively short time.

Lightweight Query Interfaces

Our query interfaces can operate in both synchronous and asynchronous modes, depending on the needs of the investigation. For example, some investigators (or automated analysis agents) may only be interested in the final aggregated result. Other investigations (such as pathology case lookups) may prefer an "on the fly" view as each peer responds. In both cases, our aggregation mechanisms are completely agnostic to the data being transported and never persist data as it is routed.

  • No labels