The PIT-trap-A "model-free" bootstrap procedure for inference about regression models with discrete, multivariate responses.
David I Warton
Yi Alice Wang
- Publisher: Public Library of Science (PLoS)
(issn: 1932-6203, eissn: 1932-6203)
Probability Theory | Research Article | Crustaceans | Mathematics | Ecology and Environmental Sciences | Mathematical and Statistical Techniques | Test Statistics | Crabs | Statistical Distributions | Simulation and Modeling | Ecology | Physical Sciences | Probability Distribution | Animals | Statistics (Mathematics) | Biology and Life Sciences | Copepods | Theoretical Ecology | Research and Analysis Methods | Arthropoda | Medicine | Q | R | Science | Organisms | Invertebrates | Statistical Methods
arxiv: Statistics::Methodology | Statistics::Theory
Bootstrap methods are widely used in statistics, and bootstrapping of residuals can be especially useful in the regression context. However, difficulties are encountered extending residual resampling to regression settings where residuals are not identically distributed (thus not amenable to bootstrapping)—common examples including logistic or Poisson regression and generalizations to handle clustered or multivariate data, such as generalised estimating equations. We propose a bootstrap method based on probability integral transform (PIT-) residuals, which we call the PIT-trap, which assumes data come from some marginal distribution F of known parametric form. This method can be understood as a type of “model-free bootstrap”, adapted to the problem of discrete and highly multivariate data. PIT-residuals have the key property that they are (asymptotically) pivotal. The PIT-trap thus inherits the key property, not afforded by any other residual resampling approach, that the marginal distribution of data can be preserved under PIT-trapping. This in turn enables the derivation of some standard bootstrap properties, including second-order correctness of pivotal PIT-trap test statistics. In multivariate data, bootstrapping rows of PIT-residuals affords the property that it preserves correlation in data without the need for it to be modelled, a key point of difference as compared to a parametric bootstrap. The proposed method is illustrated on an example involving multivariate abundance data in ecology, and demonstrated via simulation to have improved properties as compared to competing resampling methods.