Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ ZENODOarrow_drop_down
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Dataset
Data sources: ZENODO
addClaim

Overcoming Language Barriers: Multilingual Analysis of the 2023 Swiss Privacy Law's Impact

Authors: Nenadic, Luka;

Overcoming Language Barriers: Multilingual Analysis of the 2023 Swiss Privacy Law's Impact

Abstract

This repository contains the complete research pipeline for automatically analyzing privacy policies in a multilingual setting (validated for English, German, French, and Italian). The project operationalizes this pipeline in the context of (1) a revision in Swiss privacy law and (2) the use of automated policy generators. Repository Structure This repository is organized into three main subdirectories, each serving a specific purpose in the research pipeline: 1. Analysis The Analysis directory contains most statistical analyses and data processing scripts. Key Components:- Data preprocessing (including text cleaning and removal of personal identifiers with Presidio)- Creation of the three relevant groups for the analysis (CH, CH & EU, and EU)- Main statistical and semantic analysis scripts presented in the paper- Analysis of policy clusters based on generator use 2. Data The Data directory contains the relevant datasets and corpora used throughout the paper. Key Components:- Annotated datasets: 1. LLM-annotated full original dataset after removal of personal identifiers with Presidio ("swiss-gdpr_annotated.parquet") 2. Final annotated and grouped dataset used for all statistical analyses ("swiss-gdpr_annotated_groups.parquet") 3. Log of the LLM annotations ("run.log")- CrUX dataset used for website popularity rankings as well as the website budgeting list used to scrape the initial dataset- Embeddings of the policies used for the cluster analysis 3. LLM The LLM directory contains all files related to the LLM-based data analysis. Key Components:- The codebooks, human annotations (reference benchmark), and evaluations for all three initial annotation phases ("Annotations")- The validation of the models against the final set of human annotations ("Validation")- The scripts for the large-scale policy evaluation using OpenAI's GPT-5 ("Evaluation") Citation If you use this work in your research, please cite it as: ```Accepted at PETS '26 [Citation details to be added upon publication] ``` We kindly ask you to cite the paper and not the dataset itself. Please find a more detailed list of funding sources in the paper's Acknowledgments section. License This work and its artifacts are licensed under a CC-BY 4.0 license. Contact For questions about this research, please contact Luka Nenadic at lnenadic@ethz.ch.

Powered by OpenAIRE graph
Found an issue? Give us feedback