Differential Privacy and Machine Learning: a Survey and Review

Preprint English OPEN
Ji, Zhanglong ; Lipton, Zachary C. ; Elkan, Charles (2014)
  • Subject: Computer Science - Learning | Computer Science - Cryptography and Security | Computer Science - Databases

The objective of machine learning is to extract useful information from data, while privacy is preserved by concealing information. Thus it seems hard to reconcile these competing interests. However, they frequently must be balanced when mining sensitive data. For example, medical research represents an important application where it is necessary both to extract useful information and protect patient privacy. One way to resolve the conflict is to extract general characteristics of whole populations without disclosing the private information of individuals. In this paper, we consider differential privacy, one of the most popular and powerful definitions of privacy. We explore the interplay between machine learning and differential privacy, namely privacy-preserving machine learning algorithms and learning-based data release mechanisms. We also describe some theoretical results that address what can be learned differentially privately and upper bounds of loss functions for differentially private algorithms. Finally, we present some open questions, including how to incorporate public data, how to deal with missing data in private datasets, and whether, as the number of observed samples grows arbitrarily large, differentially private machine learning algorithms can be achieved at no cost to utility as compared to corresponding non-differentially private algorithms.
  • References (56)
    56 references, page 1 of 6

    [1] Boaz Barak, Kamalika Chaudhuri, Cynthia Dwork, Satyen Kale, Frank McSherry, and Kunal Talwar. Privacy, accuracy, and consistency too: a holistic solution to contingency table release. In ACM SIGACT-SIGMODSIGART Symposium on Principles of Database Systems, pages 273-282, 2007.

    [2] Avrim Blum, Cynthia Dwork, Frank McSherry, and Kobbi Nissim. Practical privacy: the SuLQ framework. In ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, pages 128-138, 2005.

    [3] Avrim Blum, Katrina Ligett, and Aaron Roth. A learning theory approach to non-interactive database privacy. In ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, pages 609-618, 2008.

    [4] Kamalika Chaudhuri and Daniel Hsu. Convergence rates for differentially private statistical estimation. In ICML, 2012.

    [5] Kamalika Chaudhuri and Claire Monteleoni. Privacy-preserving logistic regression. In Advances in Neural Information Processing Systems, pages 289-296, 2008.

    [6] Kamalika Chaudhuri, Claire Monteleoni, and Anand D. Sarwate. Differentially private empirical risk minimization. In Journal of Machine Learning Research, pages 1069-1109, 2011.

    [7] Kamalika Chaudhuri, Anand D. Sarwate, and Kaushik Sinha. Near-optimal differentially private principal components. In Advances in Neural Information Processing Systems, pages 998-1006, 2012.

    [8] Rui Chen, Noman Mohammed, Benjamin C. M. Fung, Bipin C. Desai, and Li Xiong. Publishing set-valued data via differential privacy. In International Conference on Very Large Data Bases, pages 1087-1098, 2011.

    [9] Graham Cormode. Personal privacy vs population privacy: learning to attack anonymization. In International Conference on Knowledge Discovery and Data Mining, pages 1253-1261, 2011.

    [10] Graham Cormode, Cecilia M. Procopiuc, Divesh Srivastava, and Thanh T. L. Tran. Differentially private summaries for sparse data. In International Conference on Database Theory, pages 299-311, 2012.

  • Metrics
    No metrics available
Share - Bookmark