Corpora for sentiment analysis of Arabic text in social media

Part of book or chapter of book English OPEN
Itani, Maher ; Roast, Chris ; Al-Khayatt, Samir (2017)
  • Publisher: IEEE
  • Related identifiers: doi: 10.1109/IACS.2017.7921947
  • Subject:

Different Natural Language Processing (NLP) applications such as text categorization, machine translation, etc., need annotated corpora to check quality and performance. Similarly, sentiment analysis requires annotated corpora to test the performance of classifiers. Manual annotation performed by native speakers is used as a benchmark test to measure how accurate a classifier is. In this paper we summarise currently available Arabic corpora and describe work in progress to build, annotate, and use Arabic corpora consisting of Facebook (FB) posts. The distinctive nature of thesecorpora is that it is based on posts written in Dialectal Arabic (DA) not following specific grammatical or spelling standards. The corpora are annotated with five labels (positive, negative, dual, neutral, and spam). In addition to building the corpus, the paper illustrates how manual tagging can be used to extract opinionated words and phrases to be used in a lexicon-based classifier.
  • References (58)
    58 references, page 1 of 6

    [1] El-Halees, A., 2011. Arabic opinion mining using combined classification approach.

    [2] Jin, X., Li, Y., Mah, T. and Tong, J., 2007, August. Sensitive webpage classification for content advertising. In Proceedings of the 1st international workshop on Data mining and audience intelligence for advertising (28-33). ACM.

    [3] Mishne, G. and Glance, N.S., 2006, March. Predicting Movie Sales from Blogger Sentiment. In AAAI spring symposium: computational approaches to analyzing weblogs (155-158).

    [4] Shikalgar, N.R. and Badgujar, D., 2013. Online Review Mining for forecasting sales. International Journal for research in Engineering & Technologies (IJRET) December.

    [5] Tatemura, J., 2000, January. Virtual reviewers for collaborative exploration of movie reviews. In Proceedings of the 5th international conference on Intelligent user interfaces ( 272-275). ACM.

    [6] Somasundaran, S., Wilson, T., Wiebe, J. and Stoyanov, V., 2007, March. QA with Attitude: Exploiting Opinion Type Analysis for Improving Question Answering in On-line Discussions and the News. In ICWSM.

    [7] Stoyanov, V., Cardie, C. and Wiebe, J., 2005, October. Multiperspective question answering using the OpQA corpus. In Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing ( 923-930). Association for Computational Linguistics.

    [8] Bollen, J., Mao, H. and Zeng, X., 2011. Twitter mood predicts the stock market. Journal of Computational Science, 2(1), 1-8.

    [10] The Arabic Language. 2013. [Online] Available at [Accessed 17 July 2016]

    [11] Itani, M.M., Zantout, R.N., Hamandi, L. and Elkabani, I., 2012, December. Classifying sentiment in arabic social networks: Naïve search versus Naïve bayes. In Advances in Computational Tools for Engineering Applications (ACTEA), 2012 2nd International Conference on (192-197). IEEE.

  • Metrics
    views in OpenAIRE
    views in local repository
    downloads in local repository

    The information is available from the following content providers:

    From Number Of Views Number Of Downloads
    Sheffield Hallam University Research Archive - IRUS-UK 0 231
Share - Bookmark