$k$-means clustering of extremes

descriptionPublicationkeyboard_double_arrow_right Article , Preprint , Other literature type 01 Jan 2020Embargo end date: 01 Apr 2019 Netherlands Publisher:Institute of Mathematical StatisticsJournal:Electronic Journal of Statistics, volume 14 (issn: 1935-7524,

Copyright policy )

Authors: Janßen, Anja; Wan, Phyllis;

doi: 10.1214/20-ejs1689 , 10.48550/arxiv.1904.02970

arXiv: 1904.02970

$k$-means clustering of extremes

- Summary
- Subjects
- Metrics

Abstract

The $k$-means clustering algorithm and its variant, the spherical $k$-means clustering, are among the most important and popular methods in unsupervised learning and pattern detection. In this paper, we explore how the spherical $k$-means algorithm can be applied in the analysis of only the extremal observations from a data set. By making use of multivariate extreme value analysis we show how it can be adopted to find "prototypes" of extremal dependence and we derive a consistency result for our suggested estimator. In the special case of max-linear models we show furthermore that our procedure provides an alternative way of statistical inference for this class of models. Finally, we provide data examples which show that our method is able to find relevant patterns in extremal observations and allows us to classify extremal events.

Country

Netherlands

Related Organizations

Department of Mathematics
Royal Institute of Technology
Sweden
Erasmus University Rotterdam
Netherlands

Keywords

FOS: Computer and information sciences, Classification and discrimination; cluster analysis (statistical aspects), dimension reduction, Statistics of extreme values; tail inference, 62H30, 60G70, spectral measure, 62G32, 62H30, 60G70, extreme value statistics, Extreme value theory; extremal stochastic processes, Methodology (stat.ME), Inference from stochastic processes and spectral analysis, $k$-means clustering, $k$-means clustering, Statistics - Methodology, 62G32

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	25
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Top 10%
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Top 10%
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Top 10%