
handle: 11454/62380
Abstract In recent years, sentiment analysis is becoming more and more important as the number of digital text resources increases in parallel with the development of information technology. Feature selection is a crucial sub-stage for the sentiment analysis as it can improve the overall predictive performance of a classifier while reducing the dimensionality of a problem. In this study, we propose a novel wrapper feature selection algorithm based on Iterated Greedy (IG) metaheuristic for sentiment classification. We also develop a selection procedure that is based on pre-calculated filter scores for the greedy construction part of the IG algorithm. A comprehensive experimental study is conducted on commonly-used sentiment analysis datasets to assess the performance of the proposed method. The computational results show that the proposed algorithm achieves 96.45% and 90.74% accuracy rates on average by using Multinomial Naive Bayes classifier for 9 public sentiment and 4 Amazon product reviews datasets, respectively. The results also reveal that our algorithm outperforms state-of-the-art results for the 9 public sentiment datasets. Moreover, the proposed algorithm produces highly competitive results with state-of-the-art feature selection algorithms for 4 Amazon datasets.
Sentiment classification, Feature selection, Machine learning, Metaheuristic, Iterated greedy
Sentiment classification, Feature selection, Machine learning, Metaheuristic, Iterated greedy
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 113 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Top 1% | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Top 10% | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Top 1% |
