publication . Preprint . 2017

Information, Privacy and Stability in Adaptive Data Analysis

Smith, Adam;
Open Access English
  • Published: 02 Jun 2017
Traditional statistical theory assumes that the analysis to be performed on a given data set is selected independently of the data themselves. This assumption breaks downs when data are re-used across analyses and the analysis to be performed at a given stage depends on the results of earlier stages. Such dependency can arise when the same data are used by several scientific studies, or when a single analysis consists of multiple stages. How can we draw statistically valid conclusions when data are re-used? This is the focus of a recent and active line of work. At a high level, these results show that limiting the information revealed by earlier stages of analys...
free text keywords: Statistics - Machine Learning, Computer Science - Learning
Funded by
NSF| BIGDATA: F: DKA: Scalable, Private Algorithms for Continual Data Analysis
  • Funder: National Science Foundation (NSF)
  • Project Code: 1447700
  • Funding stream: Directorate for Computer & Information Science & Engineering | Division of Information and Intelligent Systems
Download from
43 references, page 1 of 3

[1] M. S. Alvim, K. Chatzikokolakis, C. Palamidessi, and G. Smith. Measuring information leakage using generalized gain functions. In 25th IEEE Computer Security Foundations Symposium (CSF), pages 265{279, 2012.

[2] M. S. Alvim, K. Chatzikokolakis, A. McIver, C. Morgan, C. Palamidessi, and G. Smith. Additive and multiplicative notions of leakage, and their capacities. In 27th IEEE Computer Security Foundations Symposium (CSF), pages 308{322, 2014.

[3] M. S. Alvim, K. Chatzikokolakis, A. McIver, C. Morgan, C. Palamidessi, and G. Smith. Axioms for information leakage. In 29th IEEE Computer Security Foundations Symposium (CSF), pages 77{92, 2016. [OpenAIRE]

[4] R. Bassily and Y. Freund. Typical stability. arXiv:1604.03336 [cs.LG], April 2016. [OpenAIRE]

[5] R. Bassily, A. Smith, and A. Thakurta. Private empirical risk minimization: E cient algorithms and tight error bounds. In IEEE Symposium on the Foundations of Computer Science (FOCS), pages 464{473, 2014. [OpenAIRE]

[6] R. Bassily, K. Nissim, A. Smith, T. Steinke, U. Stemmer, and J. Ullman. Algorithmic stability for adaptive data analysis. In 48th Annual ACM SIGACT Symposium on Theory of Computing, pages 1046{1059, 2016.

[7] R. Berk, L. Brown, A. Buja, K. Zhang, and L. Zhao. Valid post-selection inference. The Annals of Statistics, 41(2):802{837, 2013. [OpenAIRE]

[8] A. Blum and M. Hardt. The ladder: A reliable leaderboard for machine learning competitions. In Proc. 32 nd International Conference on Machine Learning, 2015. arXiv:1502.04585. [OpenAIRE]

[9] O. Bousquet and A. Elissee . Stability and generalization. Journal of Machine Learning Research, 2: 499{526, 2002.

[10] A. Buja, R. Berk, L. Brown, E. George, E. Pitkin, M. Traskin, L. Zhao, and K. Zhang. Models as approximations|a conspiracy of random regressors and model deviations against classical inference in regression. Statistical Science, 1460, 2015.

[11] M. Bun and T. Steinke. Concentrated di erential privacy: Simpli cations, extensions, and lower bounds. In Theory of Cryptography Conference (TCC) 2016-B, 2016. arxiv:1605.02065. [OpenAIRE]

[12] N. Ciganovic, N. J. Beaudry, and R. Renner. Smooth max-information as one-shot generalization for mutual information. IEEE Trans. Information Theory, 60(3):1573{1581, 2014.

[13] R. Cummings, K. Ligett, K. Nissim, A. Roth, and Z. S. Wu. Adaptive learning with robust generalization guarantees. In 29th Annual Conference on Learning Theory, pages 772{814, 2016. [OpenAIRE]

[14] L. Devroye and T. Wagner. Distribution-free performance bounds for potential function rules. IEEE Transactions on Information Theory, 25(5):601{604, 1979.

[15] Y. Dodis, R. Ostrovsky, L. Reyzin, and A. Smith. Fuzzy extractors: How to generate strong keys from biometrics and other noisy data. SIAM J. Comput., 38(1):97{139, 2008.

43 references, page 1 of 3
Powered by OpenAIRE Open Research Graph
Any information missing or wrong?Report an Issue