
Reproducibility is imperative for any scientific discovery. More often than not, modern scientific findings rely on statistical analysis of high-dimensional data. At a minimum, reproducibility manifests itself in stability of statistical results relative to "reasonable" perturbations to data and to the model used. Jacknife, bootstrap, and cross-validation are based on perturbations to data, while robust statistics methods deal with perturbations to models. In this article, a case is made for the importance of stability in statistics. Firstly, we motivate the necessity of stability for interpretable and reliable encoding models from brain fMRI signals. Secondly, we find strong evidence in the literature to demonstrate the central role of stability in statistical inference, such as sensitivity analysis and effect detection. Thirdly, a smoothing parameter selector based on estimation stability (ES), ES-CV, is proposed for Lasso, in order to bring stability to bear on cross-validation (CV). ES-CV is then utilized in the encoding models to reduce the number of predictors by 60% with almost no loss (1.3%) of prediction performance across over 2,000 voxels. Last, a novel "stability" argument is seen to drive new results that shed light on the intriguing interactions between sample to sample variability and heavier tail error distribution (e.g., double-exponential) in high-dimensional regression models with $p$ predictors and $n$ independent samples. In particular, when $p/n\rightarrowκ\in(0.3,1)$ and the error distribution is double-exponential, the Ordinary Least Squares (OLS) is a better estimator than the Least Absolute Deviation (LAD) estimator.
Published in at http://dx.doi.org/10.3150/13-BEJSP14 the Bernoulli (http://isi.cbs.nl/bernoulli/) by the International Statistical Institute/Bernoulli Society (http://isi.cbs.nl/BS/bshome.htm)
FOS: Computer and information sciences, Statistics & Probability, fMRI, Statistics, Mathematics - Statistics Theory, Statistics Theory (math.ST), stability, cross-validation, double exponential error, estimation stability, high-dim regression, Methodology (stat.ME), robust statistics, FOS: Mathematics, Econometrics, Lasso, movie reconstruction, Statistics - Methodology
FOS: Computer and information sciences, Statistics & Probability, fMRI, Statistics, Mathematics - Statistics Theory, Statistics Theory (math.ST), stability, cross-validation, double exponential error, estimation stability, high-dim regression, Methodology (stat.ME), robust statistics, FOS: Mathematics, Econometrics, Lasso, movie reconstruction, Statistics - Methodology
| citations This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 94 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Top 1% | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Top 10% | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Top 10% |
