• shareshare
  • link
  • cite
  • add
auto_awesome_motion View all 10 versions
Publication . Conference object . Article . 2020

Dataset for Temporal Analysis of English-French Cognates

Frossard, Esteban; Coustaty, Mickael; Doucet, Antoine; Jatowt, Adam; Hengchen, Simon;
Open Access
Published: 13 May 2020
Publisher: Zenodo
International audience; Languages change over time and, thanks to the abundance of digital corpora, their evolutionary analysis using computational techniques has recently gained much research attention. In this paper, we focus on creating a dataset to support investigating the similarity in evolution between different languages. We look in particular into the similarities and differences between the use of corresponding words across time in English and French, two languages from different linguistic families yet with shared syntax and close contact. For this we select a set of cognates in both languages and study their frequency changes and correlations over time. We propose a new dataset for computational approaches of synchronized diachronic investigation of language pairs, and subsequently show novel findings stemming from the cognate-focused diachronic comparison of the two chosen languages. To the best of our knowledge, the present study is the first in the literature to use computational approaches and large data to make a cross-language diachronic analysis.

Crosslingual semantic change, cognates, temporal analysis, semantic analysis, [INFO.INFO-TT]Computer Science [cs]/Document and Text Processing, [INFO.INFO-LG]Computer Science [cs]/Machine Learning [cs.LG], [INFO.INFO-IR]Computer Science [cs]/Information Retrieval [cs.IR], [INFO.INFO-HC]Computer Science [cs]/Human-Computer Interaction [cs.HC], [INFO.INFO-DL]Computer Science [cs]/Digital Libraries [cs.DL], [INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL], [INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI], 113 Computer and information sciences, 6121 Languages

Related Organizations
Funded by
Cross-Lingual Embeddings for Less-Represented Languages in European News Media
  • Funder: European Commission (EC)
  • Project Code: 825153
  • Funding stream: H2020 | RIA
Validated by funder
EC| NewsEye
NewsEye: A Digital Investigator for Historical Newspapers
  • Funder: European Commission (EC)
  • Project Code: 770299
  • Funding stream: H2020 | RIA