
The key challenge in text classification techniques is the overall performance. In this paper we discuss the need of using parallel/distributed data mining algorithms for text classification. Three classification algorithms: k-NN Classifier, Centroid Classifier and Naive Bayes Classifier are considered. As data is growing the need of Hadoop, Map Reduce and Spark models is more important in data science. The overall outcome of the paper is the actual findings on the algorithms' efficiencies, such as accuracy of correct classification and the speed of execution. The empirical results show that Centroid Classifier is the most accurate classifier in this case with up to 95% accuracy compared to k-NN which is 92% accurate and Naive Bayes with 91.5%.
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 9 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Top 10% | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Top 10% |
