research data . Dataset . 2021

Sentiment analysis in Galaxy with IMDB movie review dataset

Kaivan Kamali;
Open Access English
  • Published: 28 Jan 2021
  • Publisher: Zenodo
Abstract
IMDB movie review sentiment classification dataset (Andrew L. Maas, Raymond E. Daly, Peter T. Pham, Dan Huang, Andrew Y. Ng, and Christopher Potts. (2011). Learning Word Vectors for Sentiment Analysis. The 49th Annual Meeting of the Association for Computational Linguistics (ACL 2011)). For more information please refer to: https://ai.stanford.edu/~amaas/data/sentiment/ The IMDB dataset was modified as follows to prepare it for use in a Galaxy Training Tutorial (https://training.galaxyproject.org/): The top 50 words are excluded (mostly stop words). Included the next 10,000 top words. Reviews are limited to 500 words max (Longer reviews trimmed and shorter reviews are padded). 25,000 reviews are used for training and testing each. Files are in tsv (tab separated value) format to be consumed by Galaxy (www.usegalaxy.org).
Andrew L. Maas, Raymond E. Daly, Peter T. Pham, Dan Huang, Andrew Y. Ng, and Christopher Potts. (2011). Learning Word Vectors for Sentiment Analysis. The 49th Annual Meeting of the Association for Computational Linguistics (ACL 2011). For more information please refer to: https://ai.stanford.edu/~amaas/data/sentiment/
Subjects
free text keywords: IMDB, Sentiment Analysis, Movie reviews
Download fromView all 2 versions
Open Access
ZENODO
Dataset . 2021
Providers: ZENODO
Any information missing or wrong?Report an Issue