
Summary: We study the problem of clustering with respect to the diameter and the radius costs. Here we approach the problem of clustering from within the framework of property testing. In property testing, the goal is to determine whether a given object has a particular property or whether it should be modified significantly so that it obtains the property. In the context of clustering, testing takes on the following form: The algorithm is given parameters \(k\), \(b\), \(\beta\), and \(\varepsilon\), and it can sample from the set of points \(X\). The goal of the algorithm is to distinguish between the case when \(X\) is \((k,b)\)-clusterable and the case when \(X\) is \(\varepsilon\)-far from being \((k,\beta)\)-clusterable. By \(\varepsilon\)-far from being \((k,\beta)\)-clusterable we mean that more than \(\varepsilon\cdot| X| \) points should be removed from \(X\) so that it becomes \((k,\beta)\)-clusterable. In this work we describe and analyze algorithms that use a sample of size polynomial in \(k\) and \(1/\varepsilon\) and independent of \(| X|\). (The dependence on \(\beta\) and on the dimension, \(d\), of the points varies with the different algorithms.) Such algorithms may be especially useful when the set of points \(X\) is very large and it may not even be feasible to observe all of it. Our algorithms can also be used to find approximately good clusterings. Namely, these are clusterings of all but an \(\varepsilon\)-fraction of the points in \(X\) that have optimal (or close to optimal) cost. The benefit of our algorithms is that they construct an implicit representation of such clusterings in time independent of \(| X|\).
Analysis of algorithms and problem complexity, randomized algorithms, Randomized algorithms, Analysis of algorithms, property testing, approximation algorithms, Approximation algorithms, clustering
Analysis of algorithms and problem complexity, randomized algorithms, Randomized algorithms, Analysis of algorithms, property testing, approximation algorithms, Approximation algorithms, clustering
| citations This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 54 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Top 10% | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Top 1% | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Top 10% |
