Clustering data streams

descriptionPublicationkeyboard_double_arrow_right Article , Conference object 08 Nov 2002Publisher:IEEE Comput. SocJournal:Proceedings 41st Annual Symposium on Foundations of Computer Science

Authors: Sudipto Guha; Nina Mishra; Rajeev Motwani 0001; Liadan O'Callaghan;

doi: 10.1109/sfcs.2000.892124

Clustering data streams

- Summary
- Metrics

Abstract

We study clustering under the data stream model of computation where: given a sequence of points, the objective is to maintain a consistently good clustering of the sequence observed so far, using a small amount of memory and time. The data stream model is relevant to new classes of applications involving massive data sets, such as Web click stream analysis and multimedia data analysis. We give constant-factor approximation algorithms for the k-median problem in the data stream model of computation in a single pass. We also show negative results implying that our algorithms cannot be improved in a certain sense.

Related Organizations

Stanford University
United States
Hewlett-Packard (United States)
United States

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	247
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Top 1%
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Top 0.1%
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Top 1%