Cross-lingual sentiment classification in low-resource Bengali language

Name: Cross-lingual sentiment classification in low-resource Bengali language
Creator: Salim Sazzed
Keywords: 0202 electrical engineering, electronic engineering, information engineering, 02 engineering and technology

Salim Sazzed

Found an issue? Give us feedback

https://www.aclweb.o...arrow_drop_down

https://www.aclweb.org/antholo...

Article

License: CC BY

Data sources: UnpayWall

https://doi.org/10.18653/v1/20...

Article . 2020 . Peer-reviewed

Data sources: Crossref

DBLP

Conference object

Data sources: DBLP

https://dx.doi.org/10.18653/v1...

Article

Data sources: Microsoft Academic Graph

Cross-lingual sentiment classification in low-resource Bengali language

descriptionPublicationkeyboard_double_arrow_right Article , Conference object 01 Jan 2020Publisher:Association for Computational Linguistics (ACL)Journal:Proceedings of the Sixth Workshop on Noisy User-generated Text (W-NUT 2020)

Authors: Salim Sazzed;

doi: 10.18653/v1/2020.wnut-1.8

Cross-lingual sentiment classification in low-resource Bengali language

- Summary
- Related research
  (2)
- Metrics

Abstract

Sentiment analysis research in low-resource languages such as Bengali is still unexplored due to the scarcity of annotated data and the lack of text processing tools. Therefore, in this work, we focus on generating resources and showing the applicability of the cross-lingual sentiment analysis approach in Bengali. For benchmarking, we created and annotated a comprehensive corpus of around 12000 Bengali reviews. To address the lack of standard text-processing tools in Bengali, we leverage resources from English utilizing machine translation. We determine the performance of supervised machine learning (ML) classifiers in machine-translated English corpus and compare it with the original Bengali corpus. Besides, we examine sentiment preservation in the machine-translated corpus utilizing Cohen’s Kappa and Gwet’s AC1. To circumvent the laborious data labeling process, we explore lexicon-based methods and study the applicability of utilizing cross-domain labeled data from the resource-rich language. We find that supervised ML classifiers show comparable performances in Bengali and machine-translated English corpus. By utilizing labeled data, they achieve 15%-20% higher F1 scores compared to both lexicon-based and transfer learning-based methods. Besides, we observe that machine translation does not alter the sentiment polarity of the review for most of the cases. Our experimental results demonstrate that the machine translation based cross-lingual approach can be an effective way for sentiment classification in Bengali.

Related Organizations

Old Dominion University
United States

2 Research products, page 1 of 1

BN-Dataset software on GitHub
IsRelatedTo
langdetect software on GitHub
IsRelatedTo

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	21
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Top 10%
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Top 10%
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Top 10%