publication . Conference object . 2020

Bilingual lexicon induction across orthographically-distinct under-resourced Dravidian languages

Bharathi Raja Chakravarthi; Navaneethan Rajasekaran; Mihael Arcan; Kevin McGuinness; Noel E. O'Connor; John P. McCrae;
Open Access English
  • Published: 13 Dec 2020
  • Publisher: International Committee on Computational Linguistics (ICCL)
  • Country: Ireland
Abstract
Bilingual lexicons are a vital tool for under-resourced languages and recent state-of-the-art approaches to this leverage pretrained monolingual word embeddings using supervised or semi- supervised approaches. However, these approaches require cross-lingual information such as seed dictionaries to train the model and find a linear transformation between the word embedding spaces. Especially in the case of low-resourced languages, seed dictionaries are not readily available, and as such, these methods produce extremely weak results on these languages. In this work, we focus on the Dravidian languages, namely Tamil, Telugu, Kannada, and Malayalam, which are even m...
Persistent Identifiers
Subjects
ACM Computing Classification System: ComputingMethodologies_DOCUMENTANDTEXTPROCESSINGInformationSystems_INFORMATIONSTORAGEANDRETRIEVAL
free text keywords: Computational linguistics, Information retrieval, Machine translating
Related Organizations
Funded by
EC| Pret-a-LLOD
Project
Pret-a-LLOD
Ready-to-use Multilingual Linked Language Data for Knowledge Services across Sectors
  • Funder: European Commission (EC)
  • Project Code: 825182
  • Funding stream: H2020 | RIA
Any information missing or wrong?Report an Issue