
Model-based clustering is a popular approach for clustering multivariate data which has seen applications in numerous fields. Nowadays, high-dimensional data are more and more common and the model-based clustering approach has adapted to deal with the increasing dimensionality. In particular, the development of variable selection techniques has received a lot of attention and research effort in recent years. Even for small size problems, variable selection has been advocated to facilitate the interpretation of the clustering results. This review provides a summary of the methods developed for variable selection in model-based clustering. Existing R packages implementing the different methods are indicated and illustrated in application to two data analysis examples.
FOS: Computer and information sciences, Variable selection, Classification and discrimination; cluster analysis (statistical aspects), model-based clustering, Machine Learning (stat.ML), Statistics - Applications, Methodology (stat.ME), Model-based clustering, Gaussian mixture model, R packages, Statistics - Machine Learning, Latent class analysis, latent class analysis, Applications (stat.AP), Computational methods for problems pertaining to statistics, Statistics - Methodology, variable selection
FOS: Computer and information sciences, Variable selection, Classification and discrimination; cluster analysis (statistical aspects), model-based clustering, Machine Learning (stat.ML), Statistics - Applications, Methodology (stat.ME), Model-based clustering, Gaussian mixture model, R packages, Statistics - Machine Learning, Latent class analysis, latent class analysis, Applications (stat.AP), Computational methods for problems pertaining to statistics, Statistics - Methodology, variable selection
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 89 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Top 1% | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Top 10% | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Top 1% |
