Machine Unlearning Doesn't Do What You Think: Lessons for Generative AI Policy and Research

Name: Machine Unlearning Doesn't Do What You Think: Lessons for Generative AI Policy and Research
Keywords: Machine Learning, FOS: Computer and information sciences, Artificial Intelligence (cs.AI), Artificial Intelligence, Computers and Society (cs.CY), Computers and Society, Machine Learning (cs.LG)

Cooper, A. Feder; Choquette-Choo, Christopher A.; Bogen, Miranda; Klyman, Kevin; Jagielski, Matthew; Filippova, Katja; Liu, Ken; Chouldechova, Alexandra; Hayes, Jamie; Huang, Yangsibo; Triantafillou, Eleni; Kairouz, Peter; Mitchell, Nicole Elyse; Mireshghallah, Niloofar; Jacobs, Abigail Z.; Grimmelmann, James; Shmatikov, Vitaly; De Sa, Christopher; Shumailov, Ilia; Terzis, Andreas; Barocas, Solon; Vaughan, Jennifer Wortman; Boyd, Danah; Choi, Yejin; Koyejo, Sanmi; Delgado, Fernando; Liang, Percy; Ho, Daniel E.; Samuelson, Pamela; Brundage, Miles; Bau, David; Neel, Seth; Wallach, Hanna; Cyphert, Amy B.; Lemley, Mark A.; Papernot, Nicolas; Lee, Katherine

Found an issue? Give us feedback

arXiv.org e-Print Ar...arrow_drop_down

arXiv.org e-Print Archive

Preprint . 2024

Data sources: arXiv.org e-Print Archive

https://doi.org/10.2139/ssrn.5...

Article . 2025 . Peer-reviewed

Data sources: Crossref

https://dx.doi.org/10.48550/ar...

Article . 2024

License: arXiv Non-Exclusive Distribution

Data sources: Datacite

Machine Unlearning Doesn't Do What You Think: Lessons for Generative AI Policy and Research

descriptionPublicationkeyboard_double_arrow_right Article , Preprint 01 Jan 2025Embargo end date: 01 Jan 2024Publisher:Elsevier BV

Authors: Cooper, A. Feder; Choquette-Choo, Christopher A.; Bogen, Miranda; Klyman, Kevin; Jagielski, Matthew; Filippova, Katja; Liu, Ken; +30 Authors

doi: 10.2139/ssrn.5288768 , 10.48550/arxiv.2412.06966

arXiv: 2412.06966

Machine Unlearning Doesn't Do What You Think: Lessons for Generative AI Policy and Research

- Summary
- Subjects
- Metrics

Abstract

"Machine unlearning" is a popular proposed solution for mitigating the existence of content in an AI model that is problematic for legal or moral reasons, including privacy, copyright, safety, and more. For example, unlearning is often invoked as a solution for removing the effects of specific information from a generative-AI model's parameters, e.g., a particular individual's personal data or the inclusion of copyrighted content in the model's training data. Unlearning is also proposed as a way to prevent a model from generating targeted types of information in its outputs, e.g., generations that closely resemble a particular individual's data or reflect the concept of "Spiderman." Both of these goals--the targeted removal of information from a model and the targeted suppression of information from a model's outputs--present various technical and substantive challenges. We provide a framework for ML researchers and policymakers to think rigorously about these challenges, identifying several mismatches between the goals of unlearning and feasible implementations. These mismatches explain why unlearning is not a general-purpose solution for circumscribing generative-AI model behavior in service of broader positive impact.

NeurIPS 2025 (Oral)

Related Organizations

Yale University
United States
University of Michigan–Flint
United States
Carnegie Mellon University
United States
Washington State University
United States
Cornell University
United States

View all View all

Keywords

Machine Learning, FOS: Computer and information sciences, Artificial Intelligence (cs.AI), Artificial Intelligence, Computers and Society (cs.CY), Computers and Society, Machine Learning (cs.LG)

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

0

Average

Green

Related to Research communities

Knowmad Institut

UArctic