
arXiv: 1701.07086
The Minimum Covariance Determinant (MCD) approach robustly estimates the location and scatter matrix using the subset of given size with lowest sample covariance determinant. Its main drawback is that it cannot be applied when the dimension exceeds the subset size. We propose the Minimum Regularized Covariance Determinant (MRCD) approach, which differs from the MCD in that the scatter matrix is a convex combination of a target matrix and the sample covariance matrix of the subset. A data-driven procedure sets the weight of the target matrix, so that the regularization is only used when needed. The MRCD estimator is defined in any dimension, is well-conditioned by construction and preserves the good robustness properties of the MCD. We prove that so-called concentration steps can be performed to reduce the MRCD objective function, and we exploit this fact to construct a fast algorithm. We verify the accuracy and robustness of the MRCD estimator in a simulation study and illustrate its practical use for outlier detection and regression analysis on real-life high-dimensional data sets in chemistry and criminology.
FOS: Computer and information sciences, Technology, Statistics & Probability, Estimation in multivariate analysis, ROBUST, breakdown value, Methodology (stat.ME), Statistical aspects of big data and data science, Computer Science, Theory & Methods, Informatique mathématique, Regularization, General nonlinear regression, 4901 Applied mathematics, ALGORITHM, Robustness and adaptive procedures (parametric inference), Statistique mathématique, Statistics - Methodology, 0802 Computation Theory and Mathematics, Computer. Automation, Applications of statistics to social sciences, Science & Technology, 0104 Statistics, Méthodes mathématiques et quantitatives, Probabilités, MULTIVARIATE LOCATION, regularization, 4905 Statistics, High-dimensional data, SCATTER, high-dimensional data, Breakdown value, Physical Sciences, Computer Science, 4903 Numerical and computational mathematics, OUTLIER DETECTION, Robust covariance estimation, robust covariance estimation, SDG 12 - Responsible Consumption and Production, criminology, MATRIX, Mathematics
FOS: Computer and information sciences, Technology, Statistics & Probability, Estimation in multivariate analysis, ROBUST, breakdown value, Methodology (stat.ME), Statistical aspects of big data and data science, Computer Science, Theory & Methods, Informatique mathématique, Regularization, General nonlinear regression, 4901 Applied mathematics, ALGORITHM, Robustness and adaptive procedures (parametric inference), Statistique mathématique, Statistics - Methodology, 0802 Computation Theory and Mathematics, Computer. Automation, Applications of statistics to social sciences, Science & Technology, 0104 Statistics, Méthodes mathématiques et quantitatives, Probabilités, MULTIVARIATE LOCATION, regularization, 4905 Statistics, High-dimensional data, SCATTER, high-dimensional data, Breakdown value, Physical Sciences, Computer Science, 4903 Numerical and computational mathematics, OUTLIER DETECTION, Robust covariance estimation, robust covariance estimation, SDG 12 - Responsible Consumption and Production, criminology, MATRIX, Mathematics
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 81 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Top 1% | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Top 10% | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Top 10% |
