
doi: 10.1111/jvs.12400
handle: 1959.4/unsworks_39204
AbstractAimA ‘good’ classification should provide information about the composition and abundance of the species within communities, if it serves as an informative surrogate for biodiversity. A natural way to formalize this is with a predictive model, where group membership (clusters) is the predictor, and multivariate species data (site by species matrix) is the response. In this study, we aimed to develop a predictive model‐based framework for evaluating the predictive performance of alternative classifications of vegetation communities, and apply it to make objective and automated decisions about classification structure.MethodsWe used GLMs fit to multivariate species data to predict occurrence of individual species with site groupings. We used AIC to estimate predictive performance of alternative models to: (1) identify optimal partitioning of sites among multiple competing flexible‐β clustering solutions; (2) identify species that contribute most to compositional differences between clusters (i.e. characteristic species); and (3) automatically merge clusters to maximize expected predictive performance using an iterative pruning approach. Using field data from southeastern Australia, and simulated data, we demonstrate our approach for common ecological data types (presence/absence, counts, cover–abundance scores, percentage cover). We supply all code and data required for these analyses.ResultsAIC was a useful metric for assessing competing classification solutions. Our method produced outputs that were simple to interpret and required few subjective choices to be made by the user, while performing similarly to the popular OptimClass assessment methodology. Characteristic species defined by predictive performance were consistent between data types, and had good general agreement with existing methods for defining characteristic species. Using model performance to iteratively refine clustering produced classifications with better than expected predictive performance compared to the dendrogram hierarchy, although the flexible‐β hierarchy did a reasonable job of improving predictive performance.ConclusionsAppropriately specified models are a natural way to maximize the predictive performance of a classification and its associated diagnostics. We show that a model‐based assessment provides a clear decision framework based on data type, offering an objective pathway to make classification assessment decisions, as well as evaluate methodological choice and performance.
anzsrc-for: 3108 Plant biology, 570, anzsrc-for: 3103 Ecology, 3103 Ecology, anzsrc-for: 0705 Forestry Sciences, anzsrc-for: 31 Biological Sciences, anzsrc-for: 0602 Ecology, 31 Biological Sciences, 8.4 Research design and methodologies (health services), anzsrc-for: 0607 Plant Biology
anzsrc-for: 3108 Plant biology, 570, anzsrc-for: 3103 Ecology, 3103 Ecology, anzsrc-for: 0705 Forestry Sciences, anzsrc-for: 31 Biological Sciences, anzsrc-for: 0602 Ecology, 31 Biological Sciences, 8.4 Research design and methodologies (health services), anzsrc-for: 0607 Plant Biology
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 12 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Top 10% | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Top 10% |
