Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ Recolector de Cienci...arrow_drop_down
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
Biblos-e Archivo
Conference object . 2023
Data sources: Biblos-e Archivo
versions View all 2 versions
addClaim

A Spanish-English parallel corpus from financial reports

Authors: Torterolo Orta, Yanco Amor; Roseti, Sofía; Carbajo Coronado, Blanca; Moreno Sandoval, Antonio;

A Spanish-English parallel corpus from financial reports

Abstract

We present the process of compiling a parallel corpus from financial reports in Spanish and their translation into English —downloaded from the websites of the IBEX-35 companies. Our aim is to create a segmented, aligned bilingual corpus to carry out linguistic and translation studies and to create linguistic resources for AI. The extraction and structuring of the information always pose the biggest challenges when compiling a corpus from PDF documents, as the information is presented in several columns with a non-linear organisation, which hinders the automatised extraction of the text. We showcase our method for extracting the narrative elements, the subsequent cleaning of the text and the alignment of the paragraphs in Spanish and English. The result is a CSV file containing both languages. We used 15 bilingual reports resulting in 1,678,426 words in Spanish and 1,452,636 words in English, and 56,170 segments in Spanish and 56,813 segments in English

This publication is part of the project "Computational linguistic methods for the readability and simplification of financial narratives. CLARA-FINT (PID2020- 116001RB-C31), funded by the Spanish Ministry of Science and Innovation and the State Research Agency

The dataset that supports the findings of this study are archived in the Universidad Autónoma de Madrid data repository e‐cienciaDatos in https://doi.org/10.21950/85MWYP

Country
Spain
Related Organizations
Keywords

Informática, bilingual corpus, parallel corpus, compilation, financial domain

  • BIP!
    Impact byBIP!
    selected citations
    These citations are derived from selected sources.
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    0
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
Powered by OpenAIRE graph
Found an issue? Give us feedback
selected citations
These citations are derived from selected sources.
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
0
Average
Average
Average
Green