
pmid: 20075479
In the machine learning field, the performance of a classifier is usually measured in terms of prediction error. In most real-world problems, the error cannot be exactly calculated and it must be estimated. Therefore, it is important to choose an appropriate estimator of the error. This paper analyzes the statistical properties, bias and variance, of the kappa-fold cross-validation classification error estimator (kappa-cv). Our main contribution is a novel theoretical decomposition of the variance of the kappa-cv considering its sources of variance: sensitivity to changes in the training set and sensitivity to changes in the folds. The paper also compares the bias and variance of the estimator for different values of kappa. The experimental study has been performed in artificial domains because they allow the exact computation of the implied quantities and we can rigorously specify the conditions of experimentation. The experimentation has been performed for two classifiers (naive Bayes and nearest neighbor), different numbers of folds, sample sizes, and training sets coming from assorted probability distributions. We conclude by including some practical recommendation on the use of kappa-fold cross validation.
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 1K | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Top 0.01% | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Top 0.1% | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Top 1% |
