
handle: 10347/39335
The objective of this work is to provide researchers in the field of corpus linguistics with proper documentation on the PaCorES Project (www.pacores.eu). The PaCorES project was created with the aim of building a collection of bidirectional parallel bilingual corpora with Spanish as the central language. The corpora currently included in the collection, in order of creation, are as follows: 1) The Parallel Corpus GermanSpanish, PaGeS, www.corpuspages.eu 2) The Parallel Corpus EnglishSpanish, PaEnS, www.corpuspaens.eu 3) The Parallel Corpus ChineseSpanish, PaCheS, www.corpuspaches.eu 4) The Parallel Corpus FrenchSpanish, PaFreS, www.corpuspafres.eu First, the authors identify the gaps and deficiencies in the landscape of bilingual and multilingual parallel corpora that include Spanish as one of the languages. Additionally, they highlight the inclusion of a particularly rare language pair, Chinese/Spanish, which has great potential due to the number of users. Next, they present the criteria that guided the design and architecture of the corpora to overcome these deficiencies. The paper emphasizes that the PaCorES corpora are fully accessible and stable, meaning they can be freely consulted online without restrictions. Stability is guaranteed, as the PaCorES corpora are published successively in clearly identified versions. Currently, the core PaCorES corpora include a collection of contemporary prose texts, mostly fiction. This type of text is underrepresented in parallel corpora due to the difficulty of obtaining them. They offer proven quality due to editorial control, and their translations have been carried out by professionals. The corpora are annotated with detailed metatextual information, documenting not only the complete source of the texts but also other data such as the translation direction, the degree of literalness, and the translator’s intervention. The next section is dedicated to the alignment process, the different software used, the F1 score achieved, and its manual review. The search architecture is explained, emphasizing the availability of three levels of search to accommodate different user needs, and detailing the functionalities of the interface and result presentation. Finally, the authors highlight that not only the individual components of PaCorES but also the project as a whole are designed with flexibility in mind. New language pairs can be added within the same collection architecture, and new texts can be incorporated into the individual components. The authors conclude that all these features make the PaCorES corpora a truly multifunctional resource that meets the needs of a wide variety of users. It serves specialists in linguistics in fields such as NLP (Natural Language Processing), lexicography, contrastive linguistics, translation studies, and language teaching and translation. Moreover, the ease of use of its search and visualization functions, along with the fast retrieval speed, allows the PaCorES collection to be used as an educational resource in language and translation teaching. In this context, intermediate to advanced students can discover numerous translation suggestions for a given term, presented directly through reliable usage examples.
Corpus multifunctionality, Corpus alignment, Parallel corpora, Corpus applications, Bidirectional corpora, 5701 Lingüística aplicada, Corpus compilation
Corpus multifunctionality, Corpus alignment, Parallel corpora, Corpus applications, Bidirectional corpora, 5701 Lingüística aplicada, Corpus compilation
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
