Secure Health Information Sharing System: SHARE

Min Wu BS, Lik Mui MEng MPhil, Mojdeh Mohtashemi PhD, Peter Szolovits PhD

Clinical Decision Making Group

MIT Laboratory for Computer Science

200 Technology Square, Cambridge, MA 02139

To date, sharing patient health information across multiple institutions while maintaining patient confidentiality remains practically difficult. We introduce SHARE, a web-based system that automates most of the steps necessary to create a secure information sharing system that supports multi-center studies. It relies on SSL for secure data communication. It uses a decentralized name space based on the Simple Distributed Security Infrastructure (SDSI) [1], for authentication and authorization. SHARE implements a layered encryption scheme for identifiers that allows the identity of the patient and even of the site from which the patient’s data originate to be recovered only with the cooperation of various authorities, as dictated by study policy [2].

SHARE’s functionality can be divided into three major components: study creation, data collection and patient re-identification. We assume that when a new multi-center study is approved, one person is designated as the “study generator”. She receives an authenticated certificate that allows her to use a secure web site we have developed to define the study database, certify “study administrators” who will manage data for the study, and create a set of customized tools to be installed at each site that contributes data. Data collection begins when a study administrator logs in to the study site server and specifies multiple available data sources and their mapping to the study database. The study site server then contacts the provider daemon at each source site server to collect study-related data. SHARE assures that each patient’s “source ID” (identifier at the data source) is dynamically encrypted into her “study ID” (identifier at the study site) when transferring data to the study database. To support study protocols that permit patient re-identification from the study site back to different source sites to collect follow-up data, a layered encryption scheme is used to create the study ID. That is: Study ID = studyOMB^public(sourceIRB^public(source ID), source site name), where studyOMB is the ombudsman responsible for verifying permission to re-identify and sourceIRB is the IRB of the source site that sets policy for its patients’ data. Thus, a researcher who needs additional information about a subject uses the study site server to send a re-identification request to the studyOMB. If permission is granted, the ombudsman decrypts the study ID, recovers the source site, and sends a request with the encrypted source ID to the appropriate sourceIRB. The sourceIRB can then decrypt the source ID, get the requested data and return them back to the study site server, which in turn notifies the researcher that they are now available at the study site.

SHARE is written in Java. We use Java servlets and JSP integrated with JDBC to support database-backed web sites and to provide the on-line database manipulation. We use JSSE to implement the SSL protocol to provide the transport level security for all the client-server and server-server communications. SDSI certificates are used by SHARE’s authentication protocols both to authenticate the various roles above and to determine their authorizations.

Patient privacy is well protected at the central study database by hiding patient identity through an encrypted study ID, loading only study-related data, authenticating database access and using authority-controlled scheme to re-identify patient. Although unique features of a patient’s data pose the risk that she may be re-identified using knowledge external to the study [3], access to the data is limited to researchers, who are in turn controlled by professional ethics and institutional oversight not to violate patient privacy. This contrasts with other scenarios in which study data are to be publicly released, which SHARE does not support. People and institutions often have the good will to try to protect confidentiality of patient data in studies, but practical difficulties frequently lead them to use data easily associated with the identify of the patient. By automating many of the steps needed to support better practice, we hope that SHARE can contribute to an improvement in patient data privacy.

References

[1] MIT CIS Group (2001) A Simple Distributed Security Infrastructure (SDSI), http://theory.lcs.mit.edu/~cis/sdsi.html

[2] Kohane, I.S., H. Dong, and P. Szolovits (1998) Health Information Identification and De-Identification Toolkit, in Proc AMIA Symp, p. 356-60

[3] Latanya Sweeney (2001) Computational Disclosure Control, A Primer on Data Privacy Protection