Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ Recolector de Cienci...arrow_drop_down
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
DIGITAL.CSIC
Article . 2025 . Peer-reviewed
Data sources: DIGITAL.CSIC
DBLP
Article . 2025
Data sources: DBLP
versions View all 3 versions
addClaim

Complex Word Identification for Lexical Simplification in Spanish Texts for Patients

Authors: Ortega Riba, Federico; Campillos-Llanos, Leonardo; Samy, Doa;

Complex Word Identification for Lexical Simplification in Spanish Texts for Patients

Abstract

[EN] This work describes the task of complex word identification (CWI) in Spanish medical texts for patients. Identifying complex words is the first step in lexical simplification, which aims to overcome the language gap between patients and healthcare professionals, enable access to information, and ensure unambiguous terminology for effective and clear communication. As part of the task, we created a medical complex words annotation guideline and compiled a corpus consisting of 225 texts (162575 tokens). A total of 18203 complex words (single and multi-words) were manually labeled, each text being annotated by two linguists with high interannotator agreement (F1 = 84.42%). The corpus was utilized to train two machine learning classifiers (Support Vector Machines and Logistic Regression) as baselines, in addition to seven deep learning transformer models. The models were selected by considering two factors: language (Spanish and multilingual) and domain (general or medical). The final results on the test set achieve an overall average F1 score of 79.02 (±0.65) for the transformer model with the best performance.

[ES ] Este artículo describe la tarea de identificación de palabras complejas en textos médicos en español para pacientes. Este es el primer paso para la simplificación léxica, cuyo objetivo es superar la barrera lingüística entre pacientes y profesionales sanitarios, permitir el acceso a la información y garantizar una terminología sin ambigüedades y una comunicación clara y eficaz. Se ha creado una guía de anotación y se ha compilado un corpus de 225 textos (162575 tokens). Se anotaron 18203 palabras complejas (entidades simples como multipalabra), siendo cada texto revisados por dos lingüistas, y alcanzando un alto valor de acuerdo entre anotadores (F1 = 84.42%). El corpus se ha empleado para entrenar modelos de aprendizaje automático (máquinas de soporte vectorial y regresión logística) como referencia, y siete modelos de aprendizaje profundo basados en transformers. Estos modelos fueron seleccionados considerando dos factores: idioma (español o multilingüe) y dominio (general o medico). Los experimentos finales muestran una puntuación F1 de 79.02 (±0.65) para el modelo transformer con mejores resultados.

Peer reviewed

Country
Spain
Related Organizations
Keywords

Recursos lingüísticos, Corpora, Language Resources, Computational linguistics, Automatic Text Simplification,, Corpus, Simplificación Automática de Text

  • BIP!
    Impact byBIP!
    selected citations
    These citations are derived from selected sources.
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    0
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
Powered by OpenAIRE graph
Found an issue? Give us feedback
selected citations
These citations are derived from selected sources.
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
0
Average
Average
Average
Green