Measuring Re-identification Risk

descriptionPublicationkeyboard_double_arrow_right Article , Preprint 13 Jun 2023Embargo end date: 01 Jan 2023 English Publisher:Association for Computing Machinery (ACM)Journal:Proceedings of the ACM on Management of Data, volume 1, pages 1-26 (eissn: 2836-6573,

Copyright policy )

Authors: CJ Carey; Travis Dick; Alessandro Epasto; Adel Javanmard; Josh Karlin; Shankar Kumar; Andres Muñoz Medina; +4 Authors

doi: 10.1145/3589294 , 10.48550/arxiv.2304.07210

arXiv: 2304.07210

Measuring Re-identification Risk

- Summary
- Subjects
- Metrics

Abstract

Compact user representations (such as embeddings) form the backbone of personalization services. In this work, we present a new theoretical framework to measure re-identification risk in such user representations. Our framework, based on hypothesis testing, formally bounds the probability that an attacker may be able to obtain the identity of a user from their representation. As an application, we show how our framework is general enough to model important real-world applications such as the Chrome's Topics API for interest-based advertising. We complement our theoretical bounds by showing provably good attack algorithms for re-identification that we use to estimate the re-identification risk in the Topics API. We believe this work provides a rigorous and interpretable notion of re-identification risk and a framework to measure it that can be used to inform real-world applications.

Related Organizations

Google (United States)
United States

Keywords

FOS: Computer and information sciences, Computer Science - Machine Learning, Computer Science - Cryptography and Security, Cryptography and Security (cs.CR), Machine Learning (cs.LG)

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	13
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Top 10%
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Top 10%
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Top 10%

Found an issue? Give us feedback

13

Top 10%

Green

gold

Fields of Science

engineering and technology

electrical engineering, electronic engineering, information engineering

Fields of Science

engineering and technology

electrical engineering, electronic engineering, information engineering