Extraction and structuring of financial terminology

descriptionPublicationkeyboard_double_arrow_right Article 01 Jan 2024 Spain English Publisher:Sociedad Española para el Procesamiento del Lenguaje NaturalJournal:Proces. del Leng. Natural, volume 73, pages 139-149

Authors: Porta Zamorano, Jordi; Carbajo Coronado, Blanca; Moreno Sandoval, Antonio;

handle: 10486/715714

Extraction and structuring of financial terminology

- Summary
- Subjects
- Metrics

Abstract

Este estudio se centra en la extracción automática de términos específicos del dominio de informes financieros españoles utilizando los modelos de lenguaje BERT y RoBERTa, tanto monolingües como multilingües. Evaluamos el rendimiento de los modelos, enfocándonos en su habilidad para generalizar términos no mostrados durante el entrenamiento. Enriquecemos esta evaluación con un análisis exhaustivo de los falsos positivos, falsos negativos y verdaderos positivos. Además, empleamos el análisis de redes sociales como propuesta para organizar sistemáticamente los términos extraídos en agrupaciones con cierta relevancia. Nuestros hallazgos indican que los modelos de lenguaje tipo transformer son una opción rentable para la identificación de este tipo de términos y muestran cómo su agrupación permite organizar los términos financieros en grupos coherentes y significativos

This study focuses on automatic term extraction to detect domainspecific terms from Spanish financial reports using BERT and RoBERTa monolingual and multilingual language models. We have evaluated the performance of the models, paying attention to their ability to identify terms that were not present during training. Additionally, we have conducted a thorough analysis of false positives, false negatives, and true positives. To further enhance our analysis, we have employed social network analysis techniques to systematically organize the extracted terms into relevant clusters. Our findings emphasize that transformer language models are a cost-effective choice for identifying such terms and show how clustering allows us to organize them into coherent and meaningful groups

The dataset that supports the findings of this study are archived in the Universidad Autónoma de Madrid data repository e‐cienciaDatos in https://doi.org/10.21950/ZF4PKF, https://doi.org/10.21950/JXFKRB, https://doi.org/10.21950/WRH0SO, https://doi.org/10.21950/2JOAZJ and https://doi.org/10.21950/FWEML6

This publication is part of the project “Computational linguistic methods for readability and simplification of financial narratives.” CLARA-FINT (PID2020-116001RBC31), funded by the Spanish Ministry of Science and Innovation and the State Research Agency

Country

Spain

Related Organizations

Complutense University of Madrid
Spain
Autonomous University of Madrid
Spain

Keywords

community detection, term extraction, Financial concepts, Filología

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

0

Average

Green