Global testing under sparse alternatives: ANOVA, multiple comparisons and the higher criticism

Article, Preprint, Other literature type OPEN
Arias-Castro, Ery ; Candès, Emmanuel J. ; Plan, Yaniv (2011)
  • Publisher: Institute of Mathematical Statistics
  • Journal: (issn: 0090-5364)
  • Related identifiers: doi: 10.1214/11-AOS910
  • Subject: incoherence | random matrices | Mathematics - Statistics Theory | analysis of variance | 94A13 | Statistics - Methodology | minimax detection | 62G20 | higher criticism | 62G10 | suprema of Gaussian processes | compressive sensing | Detecting a sparse signal

Testing for the significance of a subset of regression coefficients in a linear model, a staple of statistical analysis, goes back at least to the work of Fisher who introduced the analysis of variance (ANOVA). We study this problem under the assumption that the coefficient vector is sparse, a common situation in modern high-dimensional settings. Suppose we have $p$ covariates and that under the alternative, the response only depends upon the order of $p^{1-\alpha}$ of those, $0\le\alpha\le1$. Under moderate sparsity levels, that is, $0\le\alpha\le1/2$, we show that ANOVA is essentially optimal under some conditions on the design. This is no longer the case under strong sparsity constraints, that is, $\alpha>1/2$. In such settings, a multiple comparison procedure is often preferred and we establish its optimality when $\alpha\geq3/4$. However, these two very popular methods are suboptimal, and sometimes powerless, under moderately strong sparsity where $1/2<\alpha<3/4$. We suggest a method based on the higher criticism that is powerful in the whole range $\alpha>1/2$. This optimality property is true for a variety of designs, including the classical (balanced) multi-way designs and more modern "$p>n$" designs arising in genetics and signal processing. In addition to the standard fixed effects model, we establish similar results for a random effects model where the nonzero coefficients of the regression vector are normally distributed.
  • References (40)
    40 references, page 1 of 4

    [1] Akritas, M. G. and Papadatos, N. (2004). Heteroscedastic one-way ANOVA and lack-of-fit tests. J. Amer. Statist. Assoc. 99 368-382. MR2062823

    [2] Arias-Castro, E., Cand`es, E. J. and Plan, Y. Supplement to “Global testing under sparse alternatives: ANOVA, multiple comparisons and the higher criticism.” DOI:10.1214/11-AOS910SUPP.

    [3] Berman, S. M. (1964). Limit theorems for the maximum term in stationary sequences. Ann. Math. Statist. 35 502-516. MR0161365

    [4] Cand`es, E. J., Romberg, J. and Tao, T. (2006). Robust uncertainty principles: Exact signal reconstruction from highly incomplete frequency information. IEEE Trans. Inform. Theory 52 489-509. MR2236170

    [5] Candes, E. J. and Tao, T. (2006). Near-optimal signal recovery from random projections: Universal encoding strategies? IEEE Trans. Inform. Theory 52 5406-5425. MR2300700

    [6] Castagna, J. P., Sun, S. and Siegfried, R. W. (2003). Instantaneous spectral analysis: Detection of low-frequency shadows associated with hydrocarbons. The Leading Edge 22 120-127.

    [7] Churchill, G. (2002). Fundamentals of experimental design for cDNA microarrays. Nature Genetics 32 490-495.

    [8] Deo, C. M. (1972). Some limit theorems for maxima of absolute values of Gaussian sequences. Sankhya¯ Ser. A 34 289-292. MR0334319

    [9] Donoho, D. and Jin, J. (2004). Higher criticism for detecting sparse heterogeneous mixtures. Ann. Statist. 32 962-994. MR2065195

    [10] Donoho, D. L. (2006). Compressed sensing. IEEE Trans. Inform. Theory 52 1289- 1306. MR2241189

  • Metrics
    No metrics available
Share - Bookmark