Apache Mahout's k-Means vs Fuzzy k-Means Performance Evaluation

descriptionPublicationkeyboard_double_arrow_right Article , Conference object 01 Sep 2016 Spain Publisher:IEEEJournal:2016 International Conference on Intelligent Networking and Collaborative Systems (INCoS)

Authors: Xhafa Xhafa, Fatos; Bogza, Adriana; Caballé Llobet, Santiago; Barolli, Leonard;

doi: 10.1109/incos.2016.103

handle: 2117/93025

Apache Mahout's k-Means vs Fuzzy k-Means Performance Evaluation

- Summary
- Subjects
- Metrics

Abstract

The emergence of the Big Data as a disruptive technology for next generation of intelligent systems, has brought many issues of how to extract and make use of the knowledge obtained from the data within short times, limited budget and under high rates of data generation. The foremost challenge identified here is the data processing, and especially, mining and analysis for knowledge extraction. As the 'old' data mining frameworks were designed without Big Data requirements, a new generation of such frameworks is being developed fully implemented in Cloud platforms. One such frameworks is Apache Mahout aimed to leverage fast processing and analysis of Big Data. The performance of such new data mining frameworks is yet to be evaluated and potential limitations are to be revealed. In this paper we analyse the performance of Apache Mahout using large real data sets from the Twitter stream. We exemplify the analysis for the case of two clustering algorithms, namely, k-Means and Fuzzy k-Means, using a Hadoop cluster infrastructure for the experimental study.

(c) 2016 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other users, including reprinting/ republishing this material for advertising or promotional purposes, creating new collective works for resale or redistribution to servers or lists, or reuse of any copyrighted components of this work in other works.

Peer Reviewed

Country

Spain

Related Organizations

Open University of Catalonia
Spain
Fukuoka Institute of Technology
Japan
Universitat Politècnica de Catalunya
Spain
Universitat Polite`cnica de Catalunya
Spain

Keywords

Performance, Macrodades, Hadoop cluster, Computer algorithms, Big data, Algorismes computacionals, Àrees temàtiques de la UPC::Informàtica::Sistemes d'informació, Mineria de dades, Data mining, Data mining algorithms, K-means, Apache mahout, Fuzzy k-means

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	3
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average