[Re] Badder Seeds: Reproducing the Evaluation of Lexical Methods for Bias Measurement

descriptionPublicationkeyboard_double_arrow_right Article , Preprint 01 Jan 2022Embargo end date: 01 Jan 2022Publisher:ZenodoJournal:CoRR, volume abs/2206.01767

Authors: Jille van der Togt; Lea Tiyavorabun; Matteo Rosati; Giulio Starace;

doi: 10.5281/zenodo.6574704 , 10.48550/arxiv.2206.01767 , 10.5281/zenodo.6574705

arXiv: 2206.01767

[Re] Badder Seeds: Reproducing the Evaluation of Lexical Methods for Bias Measurement

- Summary
- Subjects
- Metrics

Abstract

Combating bias in NLP requires bias measurement. Bias measurement is almost always achieved by using lexicons of seed terms, i.e. sets of words specifying stereotypes or dimensions of interest. This reproducibility study focuses on the original authors' main claim that the rationale for the construction of these lexicons needs thorough checking before usage, as the seeds used for bias measurement can themselves exhibit biases. The study aims to evaluate the reproducibility of the quantitative and qualitative results presented in the paper and the conclusions drawn thereof. We reproduce most of the results supporting the original authors' general claim: seed sets often suffer from biases that affect their performance as a baseline for bias metrics. Generally, our results mirror the original paper's. They are slightly different on select occasions, but not in ways that undermine the paper's general intent to show the fragility of seed sets.

15 pages, 7 figures

Related Organizations

University of Amsterdam
Netherlands
UNIVERSITEIT VAN AMSTERDAM
Netherlands

Keywords

FOS: Computer and information sciences, Computer Science - Computation and Language, bias, Computer Science - Artificial Intelligence, deep learning, seeds, nlp, python, Computer Science - Computers and Society, Artificial Intelligence (cs.AI), machine learning, rescience c, Computers and Society (cs.CY), pytorch, Computation and Language (cs.CL)

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average