Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ Edinburgh Research A...arrow_drop_down
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
versions View all 1 versions
addClaim

Arabic sarcasm detection

Authors: Abu Farha, Ibrahim;

Arabic sarcasm detection

Abstract

Sarcasm is a form of verbal irony that is often used to express ridicule or contempt. When using sarcasm, a speaker expresses their opinion in an indirect way, where the literal meaning is different from the intended one. Additionally, sarcasm is a sociolinguistic tool that people use to express themselves and it reflects their cultural and social background. Sarcasm detection refers to the process of automatically and computationally identifying whether a piece of text is sarcastic. This has been well studied in the context of English, but Arabic lags behind. In this thesis, we try to fill in the gaps in the research on Arabic sarcasm detection. First, we start by exploring approaches to create an Arabic sarcasm dataset. We create ArSarcasm dataset through the re-annotation of existing sentiment analysis datasets. These labels represent perceived sarcasm as the labels reflect the annotators' perception. The analysis shows that sarcasm is prominent in the used sentiment datasets, with 16% of the sentences being sarcastic. Our experiments show that sarcasm is disruptive for sentiment analysers. Analysis shows that annotating subjective content can be challenging and prone to biases. Second, to mitigate the issues and fallbacks of sarcasm data collection approaches, we propose to collect sarcasm datasets by asking people to label their words, which is referred to as intended sarcasm. The resulting dataset, which is first-party annotated, would have more reliable and trustworthy labels and does not have the issues of third-party annotated data. Next, we test state-of-the-art machine learning models on the newly created datasets. Those experiments provide a benchmark for these datasets. The experiments show that intended sarcasm detection is more challenging than perceived sarcasm detection. Also, the experiments show that monolingual Arabic language models, which include dialects in their pre-training data, perform better on the sarcasm detection task. Additionally, we provide the details of shared tasks that utilise the new datasets. Finally, we provide an in-depth error analysis comparing humans' performance in sarcasm detection against the performance of state-of-the-art models. Our analysis confirms that sarcasm is challenging for both humans and machines. We also highlight the features and patterns used to express sarcasm, such as idioms and proverbs. When extending the analysis to focus on Arabic dialects, we found that dialect familiarity affects how Arabic speakers understand and interpret sarcasm. Arabic speakers were better able to detect sarcasm expressed in their dialect or one they were familiar with.

Country
United Kingdom
Related Organizations
Keywords

sarcasm dataset, Arabic, irony, Arabic dialects, sarcasm, Arabic sarcasm, sarcasm detection, Arabic irony

  • BIP!
    Impact byBIP!
    selected citations
    These citations are derived from selected sources.
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    0
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
Powered by OpenAIRE graph
Found an issue? Give us feedback
selected citations
These citations are derived from selected sources.
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
0
Average
Average
Average
Green
Related to Research communities