Generating sequential electronic health records using dual adversarial autoencoder

descriptionPublicationkeyboard_double_arrow_right Article 01 Sep 2020 English Publisher:Oxford University Press (OUP)Journal:Journal of the American Medical Informatics Association, volume 27, pages 1,411-1,419 (issn: 1067-5027, eissn: 1527-974X,

Copyright policy )

Authors: Dongha Lee; Hwanjo Yu; Xiaoqian Jiang; Deevakar Rogith; Meghana Gudala; Mubeen Tejani; Qiuchen Zhang; +1 Authors

doi: 10.1093/jamia/ocaa119

pmid: 32989459

pmc: PMC7647348

Generating sequential electronic health records using dual adversarial autoencoder

- Summary
- Subjects
- Metrics

Abstract

Abstract Objective Recent studies on electronic health records (EHRs) started to learn deep generative models and synthesize a huge amount of realistic records, in order to address significant privacy issues surrounding the EHR. However, most of them only focus on structured records about patients’ independent visits, rather than on chronological clinical records. In this article, we aim to learn and synthesize realistic sequences of EHRs based on the generative autoencoder. Materials and Methods We propose a dual adversarial autoencoder (DAAE), which learns set-valued sequences of medical entities, by combining a recurrent autoencoder with 2 generative adversarial networks (GANs). DAAE improves the mode coverage and quality of generated sequences by adversarially learning both the continuous latent distribution and the discrete data distribution. Using the MIMIC-III (Medical Information Mart for Intensive Care-III) and UT Physicians clinical databases, we evaluated the performances of DAAE in terms of predictive modeling, plausibility, and privacy preservation. Results Our generated sequences of EHRs showed the comparable performances to real data for a predictive modeling task, and achieved the best score in plausibility evaluation conducted by medical experts among all baseline models. In addition, differentially private optimization of our model enables to generate synthetic sequences without increasing the privacy leakage of patients’ data. Conclusions DAAE can effectively synthesize sequential EHRs by addressing its main challenges: the synthetic records should be realistic enough not to be distinguished from the real records, and they should cover all the training patients to reproduce the performance of specific downstream tasks.

Related Organizations

Pohang University of Science and Technology
Korea (Republic of)
Emory University
United States
The University of Texas Health Science Center at Houston
United States

Keywords

generative autoencoder, generative adversarial networks (GANs), Machine Learning, electornic health records (EHRs), differential privacy, Electronic Health Records, Humans, Computer Simulation, Neural Networks, Computer, sequential data generation, Confidentiality, Software

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	47
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Top 1%
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Top 10%
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Top 10%