Variable selection methods for model-based clustering

descriptionPublicationkeyboard_double_arrow_right Article , Preprint , Other literature type 01 Jan 2018Embargo end date: 01 Jan 2017 Ireland Publisher:Institute of Mathematical StatisticsJournal:Statistics Surveys, volume 12 (issn: 1935-7516,

Copyright policy )Publicly funded

Authors: Fop, Michael; Murphy, Thomas Brendan;

doi: 10.1214/18-ss119 , 10.48550/arxiv.1707.00306

arXiv: 1707.00306

handle: 10197/10520

Variable selection methods for model-based clustering

- Summary
- Subjects
- Metrics

Abstract

Model-based clustering is a popular approach for clustering multivariate data which has seen applications in numerous fields. Nowadays, high-dimensional data are more and more common and the model-based clustering approach has adapted to deal with the increasing dimensionality. In particular, the development of variable selection techniques has received a lot of attention and research effort in recent years. Even for small size problems, variable selection has been advocated to facilitate the interpretation of the clustering results. This review provides a summary of the methods developed for variable selection in model-based clustering. Existing R packages implementing the different methods are indicated and illustrated in application to two data analysis examples.

Country

Ireland

Related Organizations

University College Dublin
Ireland

Keywords

FOS: Computer and information sciences, Variable selection, Classification and discrimination; cluster analysis (statistical aspects), model-based clustering, Machine Learning (stat.ML), Statistics - Applications, Methodology (stat.ME), Model-based clustering, Gaussian mixture model, R packages, Statistics - Machine Learning, Latent class analysis, latent class analysis, Applications (stat.AP), Computational methods for problems pertaining to statistics, Statistics - Methodology, variable selection

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	89
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Top 1%
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Top 10%
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Top 1%

Found an issue? Give us feedback

89

Top 1%

Top 10%

Top 1%

Green

gold

Fields of Science

Fields of Science