BioSpace Collaborative

Academic/Biomedical Research
News & Jobs
Biotechnology and Pharmaceutical Channel Medical Device and Diagnostics Channel Clinical Research Channel BioSpace Collaborative    Job Seekers:  Register | Login          Employers:  Register | Login  

Free Newsletters
My Subscriptions

News by Subject
News by Disease
News by Date
Search News
Post Your News

Job Seeker Login
Most Recent Jobs
Search Jobs
Post Resume
Career Fairs
Career Resources
For Employers

Regional News
US & Canada
  Biotech Bay
  Biotech Beach
  Pharm Country
  Bio NC
  Southern Pharm
  BioCanada East
  C2C Services & Suppliers™


Company Profiles

Research Store

Research Events
Post an Event
Real Estate
Business Opportunities

PLoS By Category | Recent PLoS Articles
Computer Science - Non-Clinical Medicine - Science Policy

A Systematic Review of Re-Identification Attacks on Health Data
Published: Friday, December 02, 2011
Author: Khaled El Emam et al.

by Khaled El Emam, Elizabeth Jonker, Luk Arbuckle, Bradley Malin


Privacy legislation in most jurisdictions allows the disclosure of health data for secondary purposes without patient consent if it is de-identified. Some recent articles in the medical, legal, and computer science literature have argued that de-identification methods do not provide sufficient protection because they are easy to reverse. Should this be the case, it would have significant and important implications on how health information is disclosed, including: (a) potentially limiting its availability for secondary purposes such as research, and (b) resulting in more identifiable health information being disclosed. Our objectives in this systematic review were to: (a) characterize known re-identification attacks on health data and contrast that to re-identification attacks on other kinds of data, (b) compute the overall proportion of records that have been correctly re-identified in these attacks, and (c) assess whether these demonstrate weaknesses in current de-identification methods.

Methods and Findings

Searches were conducted in IEEE Xplore, ACM Digital Library, and PubMed. After screening, fourteen eligible articles representing distinct attacks were identified. On average, approximately a quarter of the records were re-identified across all studies (0.26 with 95% CI 0.046–0.478) and 0.34 for attacks on health data (95% CI 0–0.744). There was considerable uncertainty around the proportions as evidenced by the wide confidence intervals, and the mean proportion of records re-identified was sensitive to unpublished studies. Two of fourteen attacks were performed with data that was de-identified using existing standards. Only one of these attacks was on health data, which resulted in a success rate of 0.00013.


The current evidence shows a high re-identification rate but is dominated by small-scale studies on data that was not de-identified according to existing standards. This evidence is insufficient to draw conclusions about the efficacy of de-identification methods.