A Parallel Cop-Kmeans Clustering Algorithm Based on MapReduce Framework

Chao Lin; Yan Yang; Tonny Rutayisire

Found an issue? Give us feedback

https://doi.org/10.1...arrow_drop_down

https://doi.org/10.1007/978-3-...

Part of book or chapter of book . 2011 . Peer-reviewed

Data sources: Crossref

https://dx.doi.org/10.1007/978...

Other literature type

Data sources: Microsoft Academic Graph

A Parallel Cop-Kmeans Clustering Algorithm Based on MapReduce Framework

descriptionPublicationkeyboard_double_arrow_right Part of book or chapter of book , Other literature type 01 Jan 2011Publisher:Springer Berlin Heidelberg

Authors: Chao Lin; Yan Yang; Tonny Rutayisire;

doi: 10.1007/978-3-642-25661-5_13

A Parallel Cop-Kmeans Clustering Algorithm Based on MapReduce Framework

- Summary
- Metrics

Abstract

Clustering with background information is highly desirable in many business applications recently due to its potential to capture important semantics of the business/dataset. Must-Link and Cannot-Link constraints between a given pair of instances in the dataset are common prior knowledge incorporated in many clustering algorithms today. Cop-Kmeans incorporates these constraints in its clustering mechanism. However, due to rapidly increasing scale of data today, it is becoming overwhelmingly difficult for it to handle massive dataset. In this paper, we propose a parallel Cop-Kmeans algorithm based on MapReduce- a technique which basically distributes the clustering load over a given number of processors. Experimental results show that this approach can scale well to massive dataset while maintaining all crucial characteristics of the serial Cop-Kmeans algorithm.

Related Organizations

Southwest Jiaotong University
China (People's Republic of)
Southeast University
China (People's Republic of)

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	11
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Top 10%
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

11

Average

Top 10%

Average

Upload OA version

Are you the author of this publication? Upload your Open Access version to Zenodo!

It’s fast and easy, just two clicks!

uploadUpload now