descriptionPublicationkeyboard_double_arrow_right Article 24 Nov 2018 English Publisher:Springer Science and Business Media LLCJournal:Language Resources and Evaluation, volume 53, pages 397-418 (issn: 1574-020X, eissn: 1574-0218,

Authors: Saeed, A.; Nawab, R.M.A.; Stevenson, R.M.; Rayson, P.;

doi: 10.1007/s10579-018-9438-7

A word sense disambiguation corpus for Urdu

- Summary
- Subjects
- Related research
  (1)
- Metrics

Abstract

The aim of word sense disambiguation (WSD) is to correctly identify the meaning of a word in context. All natural languages exhibit word sense ambiguities and these are often hard to resolve automatically. Consequently WSD is considered an important problem in natural language processing (NLP). Standard evaluation resources are needed to develop, evaluate and compare WSD methods. A range of initiatives have lead to the development of benchmark WSD corpora for a wide range of languages from various language families. However, there is a lack of benchmark WSD corpora for South Asian languages including Urdu, despite there being over 300 million Urdu speakers and a large amounts of Urdu digital text available online. To address that gap, this study describes a novel benchmark corpus for the Urdu Lexical Sample WSD task. This corpus contains 50 target words (30 nouns, 11 adjectives, and 9 verbs). A standard, manually crafted dictionary called Urdu Lughat is used as a sense inventory. Four baseline WSD approaches were applied to the corpus. The results show that the best performance was obtained using a simple Bag of Words approach. To encourage NLP research on the Urdu language the corpus is freely available to the research community.

Related Organizations

Keywords

410, 400

1 Research products, page 1 of 1

word_sense_disambigation_corpora software on GitHub
IsRelatedTo

Impact byBIP!

	citations This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	12
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Top 10%
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Top 10%
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average