Powered by OpenAIRE graph
Found an issue? Give us feedback
addClaim

This Research product is the result of merged Research products in OpenAIRE.

You have already added 0 works in your ORCID record related to the merged Research product.

Overview of data mining classification techniques: Traditional vs. parallel/distributed programming models

Authors: Nuhi Besimi; Betim Cico; Adrian Besimi;

Overview of data mining classification techniques: Traditional vs. parallel/distributed programming models

Abstract

The key challenge in text classification techniques is the overall performance. In this paper we discuss the need of using parallel/distributed data mining algorithms for text classification. Three classification algorithms: k-NN Classifier, Centroid Classifier and Naive Bayes Classifier are considered. As data is growing the need of Hadoop, Map Reduce and Spark models is more important in data science. The overall outcome of the paper is the actual findings on the algorithms' efficiencies, such as accuracy of correct classification and the speed of execution. The empirical results show that Centroid Classifier is the most accurate classifier in this case with up to 95% accuracy compared to k-NN which is 92% accurate and Naive Bayes with 91.5%.

Related Organizations
  • BIP!
    Impact byBIP!
    selected citations
    These citations are derived from selected sources.
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    9
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Top 10%
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Top 10%
Powered by OpenAIRE graph
Found an issue? Give us feedback
selected citations
These citations are derived from selected sources.
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
9
Average
Top 10%
Top 10%
Upload OA version
Are you the author of this publication? Upload your Open Access version to Zenodo!
It’s fast and easy, just two clicks!