
Abstract Spectral clustering suffers from a scalability problem in both memory usage and computational time when the number of data instances N is large. To solve this issue, we present a fast spectral clustering algorithm able to effectively handle millions of datapoints at a desktop PC scale. The proposed technique relies on a kernel-based formulation of the spectral clustering problem, also known as kernel spectral clustering. In this framework, the Nystrom approximation of the feature map of size m, with m ≪ N, is used to solve the primal optimization problem. This leads to a reduction of time complexity from O(N3) to O(mN) and space complexity from O(N2) to O(mN). The effectiveness of the proposed algorithm in terms of computational efficiency and clustering quality is illustrated on several datasets.
Technology, Science & Technology, SISTA, Kernel methods, 46 Information and computing sciences, Computer Science, Artificial Intelligence, 09 Engineering, 17 Psychology and Cognitive Sciences, Big data, NYSTROM METHOD, NystrOm approximation, LARGE DATA SETS, 52 Psychology, Computer Science, Spectral clustering, Artificial Intelligence & Image Processing, 08 Information and Computing Sciences, 40 Engineering
Technology, Science & Technology, SISTA, Kernel methods, 46 Information and computing sciences, Computer Science, Artificial Intelligence, 09 Engineering, 17 Psychology and Cognitive Sciences, Big data, NYSTROM METHOD, NystrOm approximation, LARGE DATA SETS, 52 Psychology, Computer Science, Spectral clustering, Artificial Intelligence & Image Processing, 08 Information and Computing Sciences, 40 Engineering
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 15 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Top 10% | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Top 10% | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Top 10% |
