Notice
On May 12th a massive fire broke out at Marywilska 44, one of the Warsaw's largest shopping centers. This unexpected incident caused power outages in the surrounding area including the ICM data center where OpenAIRE services run. Efforts are underway to restore the power and to get services back online. Some functionalities will not work properly until then.
We apologize for any inconvenience this may have caused.
Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ Edinburgh Research E...arrow_drop_down
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
Edinburgh Research Explorer
Contribution for newspaper or weekly magazine . 2022
https://doi.org/10.18653/v1/20...
Conference object . 2022 . Peer-reviewed
Data sources: Crossref
versions View all 2 versions
addClaim

This Research product is the result of merged Research products in OpenAIRE.

You have already added 0 works in your ORCID record related to the merged Research product.

Mind the data gap(s): Investigating power in speech and language datasets

Authors: Markl, Nina;

Mind the data gap(s): Investigating power in speech and language datasets

Abstract

Algorithmic oppression is an urgent and persistent problem in speech and language technologies. Considering power relations embedded in datasets before compiling or using them to train or test speech and language technologies is essential to designing less harmful, more just technologies. This paper presents a reflective exercise to recognise and challenge gaps and the power relations they reveal in speech and language datasets by applying principles of Data Feminism and Design Justice, and building on work on dataset documentation and sociolinguistics.

Country
United Kingdom
Related Organizations
70 references, page 1 of 7

Abubakar Abid, Maheen Farooqi, and James Zou. 2021. Persistent anti-muslim bias in large language models. CoRR, abs/2101.05783.

McKane Andrus, Elena Spitzer, Jeffrey Brown, and Alice Xiang. 2021. What we cant measure, we cant understand. In Proceedings of the 2021 ACM conference on fairness, accountability, and transparency. ACM.

R. Ardila, M. Branson, K. Davis, M. Henretty, M. Kohler, J. Meyer, R. Morais, L. Saunders, F. M. Tyers, and G. Weber. 2020. Common voice: A massively-multilingual speech corpus. In Proceedings of the 12th Conference on Language Resources and Evaluation (LREC 2020), pages 4211-4215.

Avi Asher-Schapiro and David Sherfinski. 2021. U.S. prisons are installing AI-powered surveillance to ifght crime, documents seen by the Thomson Reuters Foundation show, but critics say privacy rights are being trampled. Thomson Reuters Foundation News.

Solon Barocas, Moritz Hardt, and Arvind Narayanan. 2019. Fairness and Machine Learning. fairmlbook.org. http://www.fairmlbook.org.

Eric P.S. Baumer and M. Six Silberman. 2011. When the implication is not to design (technology). In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, CHI '11, pages 2271- 2274. Association for Computing Machinery.

Emily M. Bender and Batya Friedman. 2018. Data Statements for Natural Language Processing: Toward Mitigating System Bias and Enabling Better Science. Transactions of the Association for Computational Linguistics, 6:587-604.

Emily M. Bender, Timnit Gebru, Angelina McMillanMajor, and Shmargaret Shmitchell. 2021. On the dangers of stochastic parrots: Can language models be too big? In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, FAccT '21, page 610-623, New York, NY, USA. Association for Computing Machinery. [OpenAIRE]

Garfield Benjamin. 2021. What we do with data: A performative critique of data 'collection'. Internet Policy Review, 10(4).

Ruha Benjamin. 2019. Race after technology : abolitionist tools for the New Jim Code. Polity Press, Newark. [OpenAIRE]

  • BIP!
    Impact byBIP!
    citations
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    1
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
  • citations
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    1
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
    Powered byBIP!BIP!
Powered by OpenAIRE graph
Found an issue? Give us feedback
citations
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
1
Average
Average
Average
Green