Differential privacy for survival analysis and user data collection

Name: Differential privacy for survival analysis and user data collection
Creator: Nguyen, Thong T.
Keywords: :Engineering::Computer science and engineering::Mathematics of computing::Probability and statistics [DRNTU], DRNTU::Engineering::Computer science and engineering::Information systems::Database management, 330, DRNTU::Engineering::Computer science and engineering::Mathematics of computing::Probability and statistics, :Engineering::Computer science and engineering::Information systems::Database management [DRNTU]

Nguyen, Thong T.

Found an issue? Give us feedback

https://dr.ntu.edu.s...arrow_drop_down

https://dr.ntu.edu.sg//bitstre...

Doctoral thesis

Data sources: UnpayWall

Digital Repository of NTU

Thesis . 2019

Data sources: Digital Repository of NTU

https://doi.org/10.32657/10220...

Doctoral thesis . 2019 . Peer-reviewed

Data sources: Crossref

https://dx.doi.org/10.32657/10...

Thesis

Data sources: Microsoft Academic Graph

DR-NTU (Digital Repository at Nanyang Technological University, Singapore)

Thesis . 2019

Data sources: Bielefeld Academic Search Engine (BASE)

Differential privacy for survival analysis and user data collection

descriptionPublicationkeyboard_double_arrow_right Doctoral thesis , Thesis 11 Sep 2019Publisher:Nanyang Technological University

Authors: Nguyen, Thong T.;

doi: 10.32657/10220/48212

handle: 10356/85347 , 10220/48212

Differential privacy for survival analysis and user data collection

- Summary
- Subjects
- Related research
  (7)
- Metrics

Abstract

Most of the personal information nowadays exist in the form of digital data which includes sensitive information such as medical records, credit card information, private instant messages, etc. In this research, we aim to investigate the data privacy problem in collecting and mining user sensitive information. We focus our research on: (i) data privacy in survival analysis which uses medical records to learn useful survival models in medical research; and (ii) data privacy in collecting user data which is the current practice of many corporations and governments. We use differential privacy, which is the golden standard in privacy protection, to address the data privacy problem in survival analysis and user data collection. To this end, we aim to achieve the following: • Guaranteeing privacy for survival analysis models which include (i) parametric and nonparametric survival models; and (ii) continuous-time and discrete-time survival regression models. • Guaranteeing privacy for users whose data is collected by corporations and governments. The main contributions of this thesis are given as follows: • For nonparametric survival models, we have proposed a private mechanism for two popular nonparametric estimators, namely Kaplan-Meier estimator and Nelson-Aalen estimator. For parametric survival models, we have proposed a simple private mechanism for accurately estimating the parameter of the exponential distribution. In addition, we have also proposed a private mechanism based on the local sensitivity concept for estimating the parameters of the Weibull distribution. • For estimating uncertainty in parametric survival models, we have proposed a private framework which allows learning the posterior function. Moreover, we have applied the proposed framework to parametric models with Weibull distribution and flexible parametric models. • We have proposed three private approaches for estimating the discrete-time survival regression model, namely extended output perturbation approach, extended objective perturbation approach, and posterior sampling approach. • We have proposed a posterior sampling approach for continuous-time survival regression model. In addition, we have also proposed a posterior perturbing approach which supports a relaxation of differential privacy for scenarios in which differential privacy is impractical. • For user data collection, we have proposed mechanisms which allow each user to publish a randomized vector of categorical data and numerical data. The proposed mechanisms are asymptotically optimal in both accuracy and run-time. Moreover, we have applied the proposed mechanisms to supervised learning problems under the empirical risk minimization framework. Doctor of Philosophy

Related Organizations

Nanyang Technological University
Singapore

Keywords

:Engineering::Computer science and engineering::Mathematics of computing::Probability and statistics [DRNTU], DRNTU::Engineering::Computer science and engineering::Information systems::Database management, 330, DRNTU::Engineering::Computer science and engineering::Mathematics of computing::Probability and statistics, :Engineering::Computer science and engineering::Information systems::Database management [DRNTU]

7 Research products, page 1 of 1

Generic Hebbian ordering-based fuzzy rule base reduced neuro-fuzzy system with fuzzy rule interpolation (RS-Hebb+)
2017IsAmongTopNSimilarDocuments
Urban data analytics for better power grid management
2018IsAmongTopNSimilarDocuments
Pairwise copula cyclic graphical model for spatial extremes modeling
2014IsAmongTopNSimilarDocuments
Latent representation models for mining geo-spatial data
2020IsAmongTopNSimilarDocuments
Assured autonomy in safety critical CPS
2021IsAmongTopNSimilarDocuments
Multi-compartment model analysis in diffusion tensor imaging
2019IsAmongTopNSimilarDocuments
Scalar and homoskedastic models for SAR and POLSAR data
2019IsAmongTopNSimilarDocuments

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	1
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average