Construyendo una libreria de Python para anonimizar datos sensibles

Las tecnologías encargadas de manejar grandes cantidades de datos han experimentado un crecimiento rápido en los últimos años, mayoritariamente gracias a lo fácilmente accesible que se han vuelto los grandes volúmenes de datos (big data). Por ello han surgido problemas a la hora de tratar de mantener un equilibrio entre la privacidad y la preservación de tanta información como sea posible. Este dilema se ve intensificado cuando tratamos con bases de datos que contienen, por ejemplo, datos médicos de un paciente. El objetivo de este trabajo de fin de Master es tratar de ofrecer una solución mediante la implementación de algunas de las técnicas más comunes de anonimización. Mas específicamente, nuestra intención es implementar una librería de Python que contenga algunos de los modelos de anonimización más populares, concretamente k-anonymity, l-diversity y t-closeness, así como ofrecer una serie de métricas de análisis para su optima implementación.

Technologies that handle large amounts of data have experienced rapid growth in recent years, thanks mainly to the easy availability of large volumes of data (big data). Problems arise when trying to maintain the balance between privacy and preserving as much information as possible. The dilemma of privacy preservation is further intensified when handling databases containing, for example, clinical patient data. The objective of this master’s thesis is to address privacy issues in data science by exploring and implementing the most common anonymization techniques. More specifically, we intend to implement a Python library with the most popular anonymization models, more specifically k-anonymity, l-diversity and t-closeness, as well as offer some performance analysis techniques for its optimal implementation.

Máster en Ciencia de Datos

Country

Spain

Related Organizations

University of Cantabria
Spain

Keywords

t-closeness, Python library, Privacy, Privacidad, Librería de Python, Sensitive data, k-anonymity, l-diversity, k-anonimato, l-diversidad, Datos sensibles

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average