Data from: R2s for correlated data: phylogenetic models, LMMs, and GLMMs

Many researchers want to report an R2 to measure the variance explained by a model. When the model includes correlation among data, such as phylogenetic models and mixed models, defining an R2 faces two conceptual problems. (i) It is unclear how to measure the variance explained by predictor (independent) variables when the model contains covariances. (ii) Researchers may want the R2 to include the variance explained by the covariances by asking questions such as “How much of the data is explained by phylogeny?” Here, I investigate three R2s for phylogenetic and mixed models. R2resid is an extension of the ordinary least-squares R2 that weights residuals by variances and covariances estimated by the model; it is closely related to R2glmm presented by Nakagawa and Schielzeth (2013). R2pred is based on predicting each residual from the fitted model and computing the variance between observed and predicted values. R2lik is based on the likelihood of fitted models and therefore reflects the amount of information that the models contain. These three R2s are formulated as partial R2s, making it possible to compare the contributions of predictor variables and variance components (phylogenetic signal and random effects) to the fit of models. Because partial R2s compare a full model with a reduced model without components of the full model, they are distinct from marginal R2s that partition additive components of the variance. The properties of the R2s for phylogenetic models were assessed using simulations for continuous and binary response data (phylogenetic generalized least squares and phylogenetic logistic regression). Because the R2s are designed broadly for any model for correlated data, the R2s were also compared for LMMs and GLMMs. R2resid, R2pred, and R2lik all have similar performance in describing the variance explained by different components of models. However, R2pred gives the most direct answer to the question of how much variance in the data is explained by a model. R2resid is most appropriate for comparing models fit to different datasets, because it does not depend on sample sizes. And R2lik is most appropriate to assess the importance of different components within the same model applied to the same data, because it is most closely associated with statistical significance tests.

Ives R2 Supplementary MaterialSupplementary Section 1: More comparisons among the R2s Supplementary Section 2: R code for figures A1 and A2

Related Organizations

University of Wisconsin–Oshkosh
United States

Keywords

Binomial regression, pseudo-likelihood, phylogenetic model, coefficient of determination, non-independent residuals

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Usage byUsageCounts

visibility	views	10
download	downloads	3

10
views
3
downloads
Powered by

Found an issue? Give us feedback

visibility

download

0

Average

10

3