Disassociation for electronic health record privacy

Article English OPEN
Loukides, Grigorios ; Liagouris, John ; Gkoulalas-Divanis, Aris ; Terrovitis, Manolis (2014)
  • Publisher: Elsevier
  • Journal: Journal of Biomedical Informatics, volume 50, pages 46-61 (issn: 1532-0464)
  • Related identifiers: doi: 10.1016/j.jbi.2014.05.009
  • Subject: Computer Science Applications | Health Informatics | QA75 | QA76 | RA

The dissemination of Electronic Health Record (EHR) data, beyond the originating healthcare institutions, can enable large-scale, low-cost medical studies that have the potential to improve public health. Thus, funding bodies, such as the National Institutes of Health (NIH) in the U.S., encourage or require the dissemination of EHR data, and a growing number of innovative medical investigations are being performed using such data. However, simply disseminating EHR data, after removing identifying information, may risk privacy, as patients can still be linked with their record, based on diagnosis codes. This paper proposes the first approach that prevents this type of data linkage using disassociation, an operation that transforms records by splitting them into carefully selected subsets. Our approach preserves privacy with significantly lower data utility loss than existing methods and does not require data owners to specify diagnosis codes that may lead to identity disclosure, as these methods do. Consequently, it can be employed when data need to be shared broadly and be used in studies, beyond the intended ones. Through extensive experiments using EHR data, we demonstrate that our method can construct data that are highly useful for supporting various types of clinical case count studies and general medical analysis tasks.
  • References (19)
    19 references, page 1 of 2

    [12] G. Cormode. Personal privacy vs population privacy: learning to attack anonymization. In KDD, pages 1253-1261, 2011.

    [13] F. K. Dankar and K. El Emam. The application of differential privacy to health data. In EDBT/ICDT Workshops, pages 158-166, 2012.

    [14] J.C. Denny. Chapter 13: Mining electronic health records in the genomics era. PLoS Computational Biology, 8(12):e1002823, 12 2012.

    [16] C. Dwork. Differential privacy. In ICALP, pages 1-12, 2006.

    [17] M. Elliot, K. Purdam, and D. Smith. Statistical disclosure control architectures for patient records in biomedical information systems. Journal of Biomedical Informatics, 41(1):58 - 64, 2008.

    [20] K. El Emam, F. Kamal Dankar, R. Issa, E. Jonker, D. Amyot, E. Cogo, J. Corriveau, M. Walker, S. Chowdhury, R. Vaillancourt, T. Roffey, and J. Bottomley. A globally optimal k-anonymity method for the de-identification of health data. Journal of American Medical Informatics Association, 16(5):670-682, 2009.

    [21] L. Fan, L. Xiong, and V. S. Sunderam. Fast: differentially private real-time aggregate monitor with filtering and adaptive sampling. In SIGMOD, pages 1065-1068, 2013.

    [22] S. Fienberg, W. Fulp, A. Slavkovic, and T. Wrobel. Secure log-linear and logistic regression analysis of distributed databases. In Privacy in statistical databases, pages 277-290, 2006.

    [23] B.C.M. Fung, K. Wang, R. Chen, and P.S. Yu. Privacy-preserving data publishing: A survey of recent developments. ACM Comput. Surv., 42(4):14:1-14:53, 2010.

    [24] J. J. Gardner, L. Xiong, Y. Xiao, J. Gao, A. R. Post, X. Jiang, and L. Ohno-Machado. Share: system design and case studies for statistical health information release. Journal of the American Medical Informatics Association, 20(1):109-116, 2013.

  • Related Research Results (1)
  • Metrics
    No metrics available
Share - Bookmark