A general family of trimmed estimators for robust high-dimensional data analysis

descriptionPublicationkeyboard_double_arrow_right Article , Preprint , Other literature type 01 Jan 2018Embargo end date: 01 Jan 2016Publisher:Institute of Mathematical StatisticsJournal:Electronic Journal of Statistics, volume 12 (issn: 1935-7524,

Copyright policy )

Authors: Yang, Eunho; Lozano, Aurélie C.; Aravkin, Aleksandr;

doi: 10.1214/18-ejs1470 , 10.48550/arxiv.1605.08299

arXiv: 1605.08299

A general family of trimmed estimators for robust high-dimensional data analysis

- Summary
- Subjects
- Metrics

Abstract

We consider the problem of robustifying high-dimensional structured estimation. Robust techniques are key in real-world applications which often involve outliers and data corruption. We focus on trimmed versions of structurally regularized M-estimators in the high-dimensional setting, including the popular Least Trimmed Squares estimator, as well as analogous estimators for generalized linear models and graphical models, using possibly non-convex loss functions. We present a general analysis of their statistical convergence rates and consistency, and then take a closer look at the trimmed versions of the Lasso and Graphical Lasso estimators as special cases. On the optimization side, we show how to extend algorithms for M-estimators to fit trimmed variants and provide guarantees on their numerical convergence. The generality and competitive performance of high-dimensional trimmed estimators are illustrated numerically on both simulated and real-world genomics data.

39 pages, 6 figures

Keywords

FOS: Computer and information sciences, Ridge regression; shrinkage estimators (Lasso), high-dimensional variable selection, Estimation in multivariate analysis, Machine Learning (stat.ML), robust estimation, Applications of statistics to biology and medical sciences; meta analysis, Statistics - Machine Learning, sparse learning, Robustness and adaptive procedures (parametric inference), Lasso, 65K10, 90C06, 62F35, 47N30, Probabilistic graphical models

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	16
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Top 10%
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Top 10%
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Top 10%

Found an issue? Give us feedback

16

Top 10%

Green

gold

Fields of Science

Fields of Science