Breakdown points for maximum likelihood estimators of location–scale mixtures

descriptionPublicationkeyboard_double_arrow_right Article , Research , Preprint , Other literature type 01 Aug 2004Embargo end date: 01 Jan 2004 United Kingdom, Switzerland, Italy Publisher:Institute of Mathematical StatisticsJournal:The Annals of Statistics, volume 32 (issn: 0090-5364,

Copyright policy )

Authors: Hennig C;

doi: 10.1214/009053604000000571 , 10.48550/arxiv.math/0410073 , 10.3929/ethz-a-004336493

arXiv: math/0410073

handle: 11585/1031443 , 20.500.11850/146304

Breakdown points for maximum likelihood estimators of location–scale mixtures

- Summary
- Subjects
- Metrics

Abstract

ML-estimation based on mixtures of Normal distributions is a widely used tool for cluster analysis. However, a single outlier can make the parameter estimation of at least one of the mixture components break down. Among others, the estimation of mixtures of t-distributions by McLachlan and Peel [Finite Mixture Models (2000) Wiley, New York] and the addition of a further mixture component accounting for ``noise'' by Fraley and Raftery [The Computer J. 41 (1998) 578-588] were suggested as more robust alternatives. In this paper, the definition of an adequate robustness measure for cluster analysis is discussed and bounds for the breakdown points of the mentioned methods are given. It turns out that the two alternatives, while adding stability in the presence of outliers of moderate size, do not possess a substantially better breakdown behavior than estimation based on Normal mixtures. If the number of clusters s is treated as fixed, r additional points suffice for all three methods to let the parameters of r clusters explode. Only in the case of r=s is this not possible for t-mixtures. The ability to estimate the number of mixture components, for example, by use of the Bayesian information criterion of Schwarz [Ann. Statist. 6 (1978) 461-464], and to isolate gross outliers as clusters of one point, is crucial for an improved breakdown behavior of all three techniques. Furthermore, a mixture of Normals with an improper uniform distribution is proposed to achieve more robustness in the case of a fixed number of components.

Published by the Institute of Mathematical Statistics (http://www.imstat.org) in the Annals of Statistics (http://www.imstat.org/aos/) at http://dx.doi.org/10.1214/009053604000000571

Countries

United Kingdom, Switzerland, Italy

Related Organizations

ETH Zurich
Switzerland
University College London
United Kingdom
Alma Mater Studiorum University of Bologna
Italy
Universität Hamburg
Germany

Keywords

MIXTURES OF DISTRIBUTIONS (PROBABILITY THEORY), mixtures of t-distributions, Mathematics - Statistics Theory, Statistics Theory (math.ST), FOS: Mathematics, Model-based cluster analysis, robust statistics, Normal mixtures, mixtures of t-distributions, noise component, classification breakdown point, Robustness and adaptive procedures (parametric inference), info:eu-repo/classification/ddc/510, Normal mixtures, Classification and discrimination; cluster analysis (statistical aspects), MAXIMUM-LIKELIHOOD-METHODE (MATHEMATISCHE STATISTIK); MISCHVERTEILUNGEN (WAHRSCHEINLICHKEITSRECHNUNG); MAXIMUM LIKELIHOOD ESTIMATION (MATHEMATICAL STATISTICS); MIXTURES OF DISTRIBUTIONS (PROBABILITY THEORY), Point estimation, classification breakdown point, noise component, Model-based cluster analysis, model-based cluster analysis, normal mixtures, MAXIMUM-LIKELIHOOD-METHODE (MATHEMATISCHE STATISTIK), robust statistics, MISCHVERTEILUNGEN (WAHRSCHEINLICHKEITSRECHNUNG), 62F35 (Primary) 62H30 (Secondary), 62F35, 62H30, Mathematics, MAXIMUM LIKELIHOOD ESTIMATION (MATHEMATICAL STATISTICS)

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	98
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Top 10%
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Top 10%
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Top 10%

Found an issue? Give us feedback

98

Top 10%

Green

hybrid

Fields of Science (4) View all

Fields of Science