Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ ZENODOarrow_drop_down
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Presentation . 2022
License: CC BY
Data sources: Datacite
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Presentation . 2022
License: CC BY
Data sources: Datacite
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Other literature type . 2022
License: CC BY
Data sources: ZENODO
versions View all 2 versions
addClaim

New Developments in Synthetic Data Generation

Authors: Brett Beaulieu-Jones; Jingchen (Monika) Hu; Aaron R. Williams;

New Developments in Synthetic Data Generation

Abstract

These talks were presented for the Privacy Day Webinar 2022 sponsored by the American Statistical Association's Committee on Privacy and Confidentiality. Link to recording. Talk 1: "The potential of privacy-preserving generative deep neural networks to support clinical data sharing" Brett Beaulieu-Jones, Harvard Medical School Abstract: Data sharing accelerates scientific progress but sharing individual-level data while preserving patient privacy presents a barrier. Deep generative adversarial networks have the potential to produce synthetic data while maintaining privacy. In some cases, the synthetic data has been shown to maintain statistical properties of source data and to be indistinguishable to human experts. This raises two important questions: 1.) How can we do this? And 2.) What is the privacy-preserving synthetic data good for? Brett Beaulieu-Jones is an Instructor of Biomedical Informatics at Harvard Medical School. He obtained his PhD from the Perelman School of Medicine at the University of Pennsylvania under the supervision of Dr. Jason Moore and Dr. Casey Greene. Beaulieu-Jones’ doctoral researchfocused on using machine learning-based methods to more precisely define phenotypes from large-scale biomedical data repositories, e.g., those contained in clinical records. He joined Dr.Isaac Kohane’s lab to do his postdoc, where he has been focused on using machine learning to better understand the heterogeneity of neurological diseases and conditions, specificallyParkinson’s disease and Epilepsy. He is a former general chair and on the advisory board for the Machine Learning for Health Workshop at NeurIPS and is a founding board member for the Association for Health Learning and Inference (parent organization of ML4H and CHIL). Talk 2: "Incorporating disclosure risk in designing data synthesis models" Jingchen (Monika) Hu, Vassar College The generation and release of synthetic data can facilitate microdata dissemination by statistical agencies. Often times, agencies would need to strike a desirable balance of the utility-risk trade-off of the synthetic data. We propose a novel approach that can incorporate the disclosure risk of each record in designing any Bayesian synthesis model. In this way, records with a higher risk can receive a higher level of protection in the resulting synthetic data. We illustrate our methods with an application to the Consumer Expenditure Surveys of the U.S. Bureau of Labor Statistics. Jingchen (Monika) Hu is an assistant professor of statistics at Vassar College. Her research focuses on statistical data privacy methods, mainly synthetic data and differential privacy. She teaches a senior seminar on statistical data privacy at Vassar and engages undergraduate students in learning cutting-edge methods and conducting applied research in this area. Talk 3: "Fully synthetic microdata for public policy analysis" Aaron R. Williams, Urban Institute Government agencies possess data that could be invaluable for evaluating public policy but often do not publicly release the data due to disclosure concerns. For instance, the IRS has rich administrative data about Americans with incomes below the income tax filing threshold and tax filers and is restricted to a select few with IRS clearance. This talk will overview the generation of the fully synthetic 2012 IRS Statistics of Income Division supplemental public use file, ongoing work in generating the fully synthetic 2012 IRS SOI PUF, and a formally-private validation server for analysis with tax data. We use sequential Classification and Regression Trees (CART) and kernel density smoothing to create useful microlevel data with disclosure protection. We test and evaluate the tradeoffs between data utility and disclosure risks of different parameterizations using a variety of validation metrics. The resulting synthetic data sets have high utility, particularly for summary statistics and microsimulation, and low disclosure risk. Aaron R. Williams is a senior data scientist in the Income and Benefits Policy Center at the Urban Institute, where he works on retirement policy, microsimulation models, data privacy, and dataimputation methods. He has worked on Urban’s Dynamic Simulation of Income (DYNASIM)microsimulation model, the Social Security Administration’s Modeling Income in the Near Term (MINT) microsimulation model, and the Tax Policy Center’s synthesis of individual tax records.Williams is an adjunct professor in the McCourt School of Public Policy at Georgetown University.

Related Organizations
  • BIP!
    Impact byBIP!
    selected citations
    These citations are derived from selected sources.
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    0
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
    OpenAIRE UsageCounts
    Usage byUsageCounts
    visibility views 26
    download downloads 29
  • 26
    views
    29
    downloads
    Powered byOpenAIRE UsageCounts
Powered by OpenAIRE graph
Found an issue? Give us feedback
visibility
download
selected citations
These citations are derived from selected sources.
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
views
OpenAIRE UsageCountsViews provided by UsageCounts
downloads
OpenAIRE UsageCountsDownloads provided by UsageCounts
0
Average
Average
Average
26
29
Green