Analyzing Discourses in Portuguese Word Embeddings: A Case of Gender Bias Outside the English-Speaking World

Name: Analyzing Discourses in Portuguese Word Embeddings: A Case of Gender Bias Outside the English-Speaking World
Keywords: Computational Linguistics, TK7885-7895, QA76.75-76.765, Computer engineering. Computer hardware, Non-English NLP, Algorithmic Sexism, Computer software, Ethics in AI, Natural Language Processing

Fernanda Tiemi de Souza Taso; Valéria Quadros dos Reis; Fábio Viduani Martinez

Found an issue? Give us feedback

Journal on Interacti...arrow_drop_down

Journal on Interactive Systems

Article . 2025 . Peer-reviewed

License: CC BY

Data sources: Crossref

Journal on Interactive Systems

Article . 2025

Data sources: DOAJ

DBLP

Article

Data sources: DBLP

Analyzing Discourses in Portuguese Word Embeddings: A Case of Gender Bias Outside the English-Speaking World

descriptionPublicationkeyboard_double_arrow_right Article 14 Jul 2025Publisher:Sociedade Brasileira de Computacao - SBJournal:Journal on Interactive Systems, volume 16, pages 532-543 (eissn: 2763-7719,

Copyright policy )

Authors: Fernanda Tiemi de Souza Taso; Valéria Quadros dos Reis; Fábio Viduani Martinez;

doi: 10.5753/jis.2025.5958

Analyzing Discourses in Portuguese Word Embeddings: A Case of Gender Bias Outside the English-Speaking World

- Summary
- Subjects
- Metrics

Abstract

In this paper we meticulously examined a Word Embedding model in Portuguese, endeavoring to identify gender biases through diverse analytical perspectives, employing SC-WEAT and RIPA metrics that is widely used in the English realm. Our inquiry focused on three primary dimensions: (1) the frequency-based association of words with feminine and masculine terms; (2) the identification of disparities between grammatical classes pertaining to gender sets; and (3) the categorisation and grouping of feminine and masculine words, including their distinctive attributes. In regard to frequency groups, our investigation revealed a pervasive negative association of words with feminine terms in most subsets, indicative of a pronounced inclination of the model’s vocabulary towards the masculine references. Notably, among the 100 most frequent words, 89 exhibited a stronger association with masculine terms. In the scrutiny of grammatical classes, our analysis demonstrated a predominant association of adjectives with feminine references, underscoring the imperative for supplementary description when referring to women. Furthermore, a conspicuous prevalence of participle verbs associated with feminine terms was observed, a phenomenon distinct from their male counterparts and one that requires further expert attention to be properly explained. The categorisation process underscored the existence of gender bias, as exemplified by the association of words with masculine terms within the domains of sport, finance, and science, while words related to feelings, home furniture, and entertainment were associated with feminine terms. These findings assume significance in fostering a discourse on gender analysis within non-English models, such as Portuguese models, thereby encouraging the Brazilian community to actively investigate biases in NLP models.

Related Organizations

Leuphana University of Lüneburg
Germany
Federal University of Mato Grosso do Sul
Brazil

Keywords

Computational Linguistics, TK7885-7895, QA76.75-76.765, Computer engineering. Computer hardware, Non-English NLP, Algorithmic Sexism, Computer software, Ethics in AI, Natural Language Processing

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

0

Average

gold

Related to Research communities

Digital Humanities and Cultural Heritage