Exact Post-Selection Inference for Changepoint Detection and Other Generalized Lasso Problems

Preprint English OPEN
Hyun, Sangwon ; G'Sell, Max ; Tibshirani, Ryan J. (2016)
  • Subject: Statistics - Methodology
    arxiv: Statistics::Computation

We study tools for inference conditioned on model selection events that are defined by the generalized lasso regularization path. The generalized lasso estimate is given by the solution of a penalized least squares regression problem, where the penalty is the l1 norm of a matrix D times the coefficient vector. The generalized lasso path collects these estimates for a range of penalty parameter ({\lambda}) values. Leveraging a sequential characterization of this path from Tibshirani & Taylor (2011), and recent advances in post-selection inference from Lee et al. (2016), Tibshirani et al. (2016), we develop exact hypothesis tests and confidence intervals for linear contrasts of the underlying mean vector, conditioned on any model selection event along the generalized lasso path (assuming Gaussian errors in the observations). By inspecting specific choices of D, we obtain post-selection tests and confidence intervals for specific cases of generalized lasso estimates, such as the fused lasso, trend filtering, and the graph fused lasso. In the fused lasso case, the underlying coordinates of the mean are assigned a linear ordering, and our framework allows us to test selectively chosen breakpoints or changepoints in these mean coordinates. This is an interesting and well-studied problem with broad applications, our framework applied to the trend filtering and graph fused lasso serves several applications as well. Aside from the development of selective inference tools, we describe several practical aspects of our methods such as valid post-processing of generalized estimates before performing inference in order to improve power, and problem-specific visualization aids that may be given to the data analyst for he/she to choose linear contrasts to be tested. Many examples, both from simulated and real data sources, are presented to examine the empirical properties of our inference methods.
  • References (31)
    31 references, page 1 of 4

    Arnold, T. & Tibshirani, R. J. (2016), 'Efficient implementations of the generalized lasso dual path algorithm', Journal of Computational and Graphical Statistics 25(1), 1-27.

    Bai, J. (1999), 'Likelihood ratio tests for multiple structural changes', Journal of Econometrics 91(2), 299-323.

    Berk, R., Brown, L., Buja, A., Zhang, K. & Zhao, L. (2013), 'Valid post-selection inference', Annals of Statistics 41(2), 802-837.

    Brodsky, B. & Darkhovski, B. (1993), Nonparametric Methods in Change-Point Problems, Springer, Netherlands.

    Chambolle, A. & Darbon, J. (2009), 'On total variation minimization and surface evolution using parametric maximum flows', International Journal of Computer Vision 84, 288-307.

    Chen, J. & Chen, Z. (2008), 'Extended Bayesian information criteria for model selection with large model spaces', Biometrika 95(3), 759-771.

    Choi, Y., Taylor, J. & Tibshirani, R. (2014), Selecting the number of principal components: estimation of the true rank of a noisy matrix. arXiv: 1410.8260.

    Eckley, I., Fearnhead, P. & Killick, R. (2011), Analysis of changepoint models, in D. Barber, T. Cemgil & S. Chiappa, eds, 'Bayesian Time Series Models', Cambridge University Press, Cambridge, chapter 10, pp. 205-224.

    Fithian, W., Sun, D. & Taylor, J. (2014), Optimal inference after model selection. arXv: 1410.2597.

    Fithian, W., Taylor, J., Tibshirani, R. & Tibshirani, R. J. (2015), Selective sequential model selection. arXiv: 1512.02565.

  • Related Research Results (1)
  • Metrics
    No metrics available
Share - Bookmark