Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ DUT Open Scholar (Du...arrow_drop_down
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
SSRN Electronic Journal
Article . 2023 . Peer-reviewed
Data sources: Crossref
versions View all 1 versions
addClaim

Word Sense Disambiguation Pipeline Framework for Low Resourced Morphologically Rich Languages

Authors: Masethe, Mosima Anna; Masethe, Hlaudi Daniel; Ojo, Sunday Olusegun; Owolawi, Pius A.;

Word Sense Disambiguation Pipeline Framework for Low Resourced Morphologically Rich Languages

Abstract

Resolving ambiguity problem is a prolonged natural language processing theoretical research challenge. Sesotho sa Leboa language is an official name for Sepedi or Northern Sotho language as known to be an official language among 11 others in South Africa spoken by 4.7 million people. Sesotho sa Leboa is an indigenous rich morphologically low resourced South African language which is a highly polysemous language, with words that have numerous context. Disambiguating polysemous words remain a challenging problem for computational linguistics research. Deficiencies of several polysemy assessments suggest that dealing with the sense distinctiveness versus polysemy problems remains an uncluttered academic issue. A practical problem in natural language processing applications is Word Sense Disambiguation which suffers drastically from shortcomings when working with ambiguous polysemous words. Therefore, Word Sense Disambiguation seeks both academic and practical results. Many Word Sense Disambiguation applications gives high accuracy for the English language, and poor accuracy for Sesotho sa Leboa language. In this research, Word Sense Disambiguation pipeline framework is developed for Sesotho sa Leboa low resourced morphologically rich language which addresses academic and practical problems of the polysemy problem. The proposed Word Sense Disambiguation pipeline framework shows pre-processing modules which is a process to reduce ambiguity from the unstructured text corpus that serve to input sentences. Hence, the researchers compute the probability of Word Sense Disambiguation when polysemy and homonymy is observed for cosine similarity measures using sentence transformer (SBERT) and Word2Vec algorithms (Skip-Gram and Continuous Bag of Words). Computation of cosine similarity measure shows SBERT outperforms other algorithms with 87% threshold which shows strong similarity between context and sense definition while Continuous Bag of Words gives cosine similarity threshold of 51%, outperforming Skip-Gram algorithms ...

Country
South Africa
Keywords

SBERT, Continuous bag of words, 401, SkipGram, Corpus, Word Sense Disambiguation, Natural Language Processing

  • BIP!
    Impact byBIP!
    selected citations
    These citations are derived from selected sources.
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    1
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
Powered by OpenAIRE graph
Found an issue? Give us feedback
selected citations
These citations are derived from selected sources.
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
1
Average
Average
Average
Green
Related to Research communities