Downloads provided by UsageCounts
handle: 2117/93025
(c) 2016 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other users, including reprinting/ republishing this material for advertising or promotional purposes, creating new collective works for resale or redistribution to servers or lists, or reuse of any copyrighted components of this work in other works. The emergence of the Big Data as a disruptive technology for next generation of intelligent systems, has brought many issues of how to extract and make use of the knowledge obtained from the data within short times, limited budget and under high rates of data generation. The foremost challenge identified here is the data processing, and especially, mining and analysis for knowledge extraction. As the 'old' data mining frameworks were designed without Big Data requirements, a new generation of such frameworks is being developed fully implemented in Cloud platforms. One such frameworks is Apache Mahout aimed to leverage fast processing and analysis of Big Data. The performance of such new data mining frameworks is yet to be evaluated and potential limitations are to be revealed. In this paper we analyse the performance of Apache Mahout using large real data sets from the Twitter stream. We exemplify the analysis for the case of two clustering algorithms, namely, k-Means and Fuzzy k-Means, using a Hadoop cluster infrastructure for the experimental study. Peer Reviewed
:Informàtica::Sistemes d'informació [Àrees temàtiques de la UPC], Performance, Macrodades, Hadoop cluster, Computer algorithms, Big data, Algorismes computacionals, Àrees temàtiques de la UPC::Informàtica::Sistemes d'informació, Mineria de dades, Data mining, Data mining algorithms, K-means, Apache mahout, Fuzzy k-means
:Informàtica::Sistemes d'informació [Àrees temàtiques de la UPC], Performance, Macrodades, Hadoop cluster, Computer algorithms, Big data, Algorismes computacionals, Àrees temàtiques de la UPC::Informàtica::Sistemes d'informació, Mineria de dades, Data mining, Data mining algorithms, K-means, Apache mahout, Fuzzy k-means
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 3 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
| views | 29 | |
| downloads | 90 |

Views provided by UsageCounts
Downloads provided by UsageCounts