
Handling a vast amount of high-dimensional data has always been challenging. The advancement of computer technology has led to an exponential growth of accumulated information where storing and processing are to be carefully handled since not all information gathered is useful. Feature selection and feature reduction algorithms have been proposed to process the data. In this paper, we review sparse clustering approaches that aim to cluster data sets while selecting and removing redundant and irrelevant features as well as noisy points and outliers. This paper surveys existing sparse clustering algorithms and explores their effectiveness in the analysis of high-dimensional data. The exploration and analysis of these sparse clustering approaches outperform the existing conventional clustering algorithms when faced with large and high-dimensional data. We also investigate the use of regularization terms in the sparse clustering algorithms. In this survey, we consider an extensive search on published papers as well as textbooks and discuss important results reported in the literature. We further explore and compare their strengths, limitations, adaptability, interpretability, complexity, and usability in handling high-dimensional data, and more research directions and topics on sparse clustering are also analyzed.
feature selection, fuzzy c-means, sparsity, Electrical engineering. Electronics. Nuclear engineering, Lasso, K-means, Clustering, TK1-9971
feature selection, fuzzy c-means, sparsity, Electrical engineering. Electronics. Nuclear engineering, Lasso, K-means, Clustering, TK1-9971
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
