Topics API Analysis

Topics API Analysis This repository provides the experimental results of the paper The Privacy-Utility Trade-off in the Topics API. Usage The notebooks were run using: Python v3.11.8 bvmlib v1.0.0 matplotlib 3.8.0 numpy 1.24.3 pandas 2.0.1 qif 1.2.3 requests 2.31.0 scipy 1.11.3 tldextract 5.1.2 tqdm 4.66.1 urllib3 1.26.16 The datasets produced for the experiments can be found on Zenodo: AOL Dataset for Browsing History and Topics of Interest (DOI: 10.5281/zenodo.11029572). Notebooks Data treatment: AOL-data-treatment.ipynb: Converts the original AOL dataset. Treats inconsistencies; Randomly remaps AnonID to RandID; Defines domains from URLs; and Filters domains by eTLD using tldextract and Mozilla's Public Suffix List, as of commit 5e6ac3a, extended by the discontinued TLDs: .bg.ac.yu, .ac.yu, .cg.yu, .co.yu, .edu.yu, .gov.yu, .net.yu, .org.yu, .yu, .or.tp, .tp, and .an. Generates the datasets AOL-treated.csv and AOL-treated-unique-domains.csv. The dataset AOL-treated.csv can be used for analyses of browsing history vulnerability and utility, as enabled by third-party cookies. This dataset contains singletons (individuals with only one domain in their browsing histories) and one outlier (one user with 150.802 domain visits in three months) that are dropped in some analyses. Citizen-Lab-Classification-data-treatment.ipynb: Converts the Citizen Lab Classification data, as of commit ebd0ee8. Treats inconsistencies; Defines domains from URLs; Filters domains by eTLD using tldextract and Mozilla's Public Suffix List, as of commit 5e6ac3a, extended by the discontinued TLDs: .bg.ac.yu, .ac.yu, .cg.yu, .co.yu, .edu.yu, .gov.yu, .net.yu, .org.yu, .yu, .or.tp, .tp, and .an; and Merges classifications by domain. Generates the dataset Citizen-Lab-Classification.csv. AOL-treated-Citizen-Lab-Classification-domain-matching.ipynb: Matches domains from AOL-treated-unique-domains.csv with domains and respective topics from Citizen-Lab-Classification.csv. Generates the dataset AOL-treated-Citizen-Lab-Classification-domain-match.csv. AOL-treated-Google-Topics-Classification-v1-domain-matching.ipynb: Matches domains from AOL-treated-unique-domains.csv with domains and respective topics from Google-Topics-Classification-v1.txt, as provided by Google with the Chrome browser. Generates the dataset AOL-treated-Google-Topics-Classification-v1-domain-match.csv. AOL-reduced-Citizen-Lab-Classification.ipynb: Converts the dataset AOL-treated.csv. Reduces the dataset AOL-treated.csv according to the dataset AOL-treated-Citizen-Lab-Classification-domain-match.csv. Generates the dataset AOL-reduced-Citizen-Lab-Classification.csv. The dataset AOL-reduced-Citizen-Lab-Classification.csv can be used for analyses of browsing history vulnerability and utility, as enabled by third-party cookies, and for analyses of topics of interest vulnerability and utility, as enabled by the Topics API. This dataset contains singletons and the outlier that are dropped in some analyses. This dataset can be used for analyses including the (data-dependent) randomness of trimming-down or filling-up the top-s sets of topics for each individual so each set has s topics. Privacy results for Generalization and utility results for Generalization, Bounded Noise, and Differential Privacy are expected to slightly vary with each run of the analyses over this dataset. AOL-reduced-Google-Topics-Classification-v1.ipynb: Converts the dataset AOL-treated.csv. Reduces the dataset AOL-treated.csv according to the dataset AOL-treated-Google-Topics-Classification-v1-domain-match.csv. Generates the dataset AOL-reduced-Google-Topics-Classification-v1.csv. The dataset AOL-reduced-Google-Topics-Classification-v1.csv can be used for analyses of browsing history vulnerability and utility, as enabled by third-party cookies, and for analyses of topics of interest vulnerability and utility, as enabled by the Topics API. This dataset contains singletons and the outlier that are dropped in some analyses. This dataset can be used for analyses including the (data-dependent) randomness of trimming-down or filling-up the top-s sets of topics for each individual so each set has s topics. Privacy results for Generalization and utility results for Generalization, Bounded Noise, and Differential Privacy are expected to slightly vary with each run of the analyses over this dataset. AOL-experimental.ipynb: Converts the dataset AOL-treated.csv. Drops singletons (individuals with only one domain in their browsing histories) and one outlier (one user with 150.802 domain visits in three months); and Defines browsing histories. Generates the dataset AOL-experimental.csv. The dataset AOL-experimental.csv can be used to empirically verify code correctness. All privacy and utility results are expected to remain the same with each run of the analyses over this dataset. AOL-experimental-Citizen-Lab-Classification.ipynb: Converts the dataset AOL-reduced-Citizen-Lab-Classification.csv. Generates the dataset AOL-experimental-Citizen-Lab-Classification.csv. The dataset AOL-experimental-Citizen-Lab-Classification.csv can be used to empirically verify code correctness. All privacy and utility results are expected to remain the same with each run of the analyses over this dataset. AOL-experimental-Google-Topics-Classification-v1.ipynb: Converts the dataset AOL-reduced-Google-Topics-Classification-v1.csv. Generates the dataset AOL-experimental-Google-Topics-Classification-v1.csv. The dataset AOL-experimental-Google-Topics-Classification-v1.csv can be used to empirically verify code correctness. All privacy and utility results are expected to remain the same with each run of the analyses over this dataset. Analyses: QIF-analyses-AOL-treated.ipynb: QIF analyses based on the dataset AOL-treated.csv. All privacy and utility results are expected to remain the same with each run of the analyses over this dataset. QIF-analyses-AOL-reduced-Citizen-Lab.ipynb: QIF analyses based on the dataset AOL-reduced-Citizen-Lab-Classification.csv. Privacy results for Generalization and utility results for Generalization, Bounded Noise, and Differential Privacy are expected to slightly vary with each run of the analyses over this dataset. QIF-analyses-AOL-reduced-Google-Topics-v1.ipynb: QIF analyses based on the dataset AOL-reduced-Google-Topics-Classification-v1.csv. Privacy results for Generalization and utility results for Generalization, Bounded Noise, and Differential Privacy are expected to slightly vary with each run of the analyses over this dataset. QIF-analyses-counting-experiment.ipynb: QIF analysis for counting topics popularity using the binomial distribution. QIF-analyses-AOL-experimental.ipynb: QIF analyses based on the dataset AOL-experimental.csv. All privacy and utility results are expected to remain the same with each run of the analyses over this dataset. QIF-analyses-AOL-experimental-Citizen-Lab.ipynb: QIF analyses based on the dataset AOL-experimental-Citizen-Lab-Classification.csv. All privacy and utility results are expected to remain the same with each run of the analyses over this dataset. QIF-analyses-AOL-experimental-Google-Topics-v1.ipynb: QIF analyses based on the dataset AOL-experimental-Google-Topics-Classification-v1.csv. All privacy and utility results are expected to remain the same with each run of the analyses over this dataset. License GNU GPLv3. To understand how the various GNU licenses are compatible with each other, please refer to the GNU licenses FAQ.

Related Organizations

Universidade Federal de Minas Gerais
Brazil
Macquarie University
Australia

Keywords

Topics API, Third-Party Cookies, QIF, Microdata, Quantitative Information Flow

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

0

Average