
arXiv: 1008.0526
Consider the standard Gaussian linear regression model $Y=X��+��$, where $Y\in R^n$ is a response vector and $ X\in R^{n*p}$ is a design matrix. Numerous work have been devoted to building efficient estimators of $��$ when $p$ is much larger than $n$. In such a situation, a classical approach amounts to assume that $��_0$ is approximately sparse. This paper studies the minimax risks of estimation and testing over classes of $k$-sparse vectors $��$. These bounds shed light on the limitations due to high-dimensionality. The results encompass the problem of prediction (estimation of $X��$), the inverse problem (estimation of $��_0$) and linear testing (testing $X��=0$). Interestingly, an elbow effect occurs when the number of variables $k\log(p/k)$ becomes large compared to $n$. Indeed, the minimax risks and hypothesis separation distances blow up in this ultra-high dimensional setting. We also prove that even dimension reduction techniques cannot provide satisfying results in an ultra-high dimensional setting. Moreover, we compute the minimax risks when the variance of the noise is unknown. The knowledge of this variance is shown to play a significant role in the optimal rates of estimation and testing. All these minimax bounds provide a characterization of statistical problems that are so difficult so that no procedure can provide satisfying results.
model selection, high-dimensional geometry, dimension reduction, Minimax procedures in statistical decision theory, adaptive estimation, Mathematics - Statistics Theory, Statistics Theory (math.ST), 519, Adaptive estimation;dimension reduction;high-dimensional regression;high-dimensional geometry;minimax risk, géométrie, Adaptive estimation, [MATH.MATH-ST]Mathematics [math]/Statistics [math.ST], 62J05, minimax risk, FOS: Mathematics, Statistiques (Mathématiques), minimax hypothesis testing, [MATH.MATH-ST] Mathematics [math]/Statistics [math.ST], 62C20, Linear regression; mixed models, mathématique, estimation du risque, High-dimensional regression, sparse vectors, [STAT.TH]Statistics [stat]/Statistics Theory [stat.TH], high-dimensional regression, 62F35
model selection, high-dimensional geometry, dimension reduction, Minimax procedures in statistical decision theory, adaptive estimation, Mathematics - Statistics Theory, Statistics Theory (math.ST), 519, Adaptive estimation;dimension reduction;high-dimensional regression;high-dimensional geometry;minimax risk, géométrie, Adaptive estimation, [MATH.MATH-ST]Mathematics [math]/Statistics [math.ST], 62J05, minimax risk, FOS: Mathematics, Statistiques (Mathématiques), minimax hypothesis testing, [MATH.MATH-ST] Mathematics [math]/Statistics [math.ST], 62C20, Linear regression; mixed models, mathématique, estimation du risque, High-dimensional regression, sparse vectors, [STAT.TH]Statistics [stat]/Statistics Theory [stat.TH], high-dimensional regression, 62F35
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 70 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Top 10% | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Top 10% | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Top 10% |
