
handle: 10486/715714
Este estudio se centra en la extracción automática de términos específicos del dominio de informes financieros españoles utilizando los modelos de lenguaje BERT y RoBERTa, tanto monolingües como multilingües. Evaluamos el rendimiento de los modelos, enfocándonos en su habilidad para generalizar términos no mostrados durante el entrenamiento. Enriquecemos esta evaluación con un análisis exhaustivo de los falsos positivos, falsos negativos y verdaderos positivos. Además, empleamos el análisis de redes sociales como propuesta para organizar sistemáticamente los términos extraídos en agrupaciones con cierta relevancia. Nuestros hallazgos indican que los modelos de lenguaje tipo transformer son una opción rentable para la identificación de este tipo de términos y muestran cómo su agrupación permite organizar los términos financieros en grupos coherentes y significativos
This study focuses on automatic term extraction to detect domainspecific terms from Spanish financial reports using BERT and RoBERTa monolingual and multilingual language models. We have evaluated the performance of the models, paying attention to their ability to identify terms that were not present during training. Additionally, we have conducted a thorough analysis of false positives, false negatives, and true positives. To further enhance our analysis, we have employed social network analysis techniques to systematically organize the extracted terms into relevant clusters. Our findings emphasize that transformer language models are a cost-effective choice for identifying such terms and show how clustering allows us to organize them into coherent and meaningful groups
The dataset that supports the findings of this study are archived in the Universidad Autónoma de Madrid data repository e‐cienciaDatos in https://doi.org/10.21950/ZF4PKF, https://doi.org/10.21950/JXFKRB, https://doi.org/10.21950/WRH0SO, https://doi.org/10.21950/2JOAZJ and https://doi.org/10.21950/FWEML6
This publication is part of the project “Computational linguistic methods for readability and simplification of financial narratives.” CLARA-FINT (PID2020-116001RBC31), funded by the Spanish Ministry of Science and Innovation and the State Research Agency
community detection, term extraction, Financial concepts, Filología
community detection, term extraction, Financial concepts, Filología
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
