CDM Seminar Series 2003-04
Automated, real-time disease surveillance systems are being created to monitor bioterrorism events as well as naturally occurring health outbreaks. Syndromic surveillance systems perform outbreak detectionusing algorithms, mathematical models of disease spread, decision support systems to assist frontline clinicians in the event of a biological attack, as well as robust electronic health information systems to provide clinical care under disaster conditions. These systems use a large amount of patient identifiable data from surrounding clinics or hospitals to recognize disease outbreaks with sufficient specificity to warrant a public health response. While this data is statistically useful, it is imperative that no patient data or privacy can be compromised in the transfer of a medical dataset. An anonymization system that can transform these patient identifiable datasets into deidentified data will have a profound effect on the ability of many medical institutions to share their hospital records in a real-time system, but with a loss of location-specific information.
The anonymization process that was used in this study retains a sense of where a patient lives by moving the geocoded address of the patient according to a normal distribution inversely correlated to the local population density. This procedure provides patients in the dataset acertain user-specified k-anonymity, or makes the patient not reversely identifiable among k other people in a local area. An analysis of the k-anonymity that can be achieved in an area of dataset anonymizationis presented along with an algorithm for dynamic fast estimation for local k-anonymity. A stand-alone utility has been created to assist in the efficient spatial de-identification of large hospital datasets encoded in disparate formats.