
The goal of class prediction studies is to develop rules to accurately predict the class membership of new subjects. The classifiers differ in the way they combine the values of the variables available for each subject. Frequently the classifiers are developed using class-imbalanced data, where the number of samples in each class is not equal. Standard classification methods used on class-imbalanced data are often biased towards the majority class: they classify most new samples in the majority class and they do not accurately predict the minority class. Data are high-dimensional when the number of variables greatly exceeds the number of subjects. In this paper we show how the high-dimensionality poses additional challenges when dealing with class-imbalanced prediction. Here we present new simulation studies for five classifiers, where we expand our previous results to correlated variables, and briefly discuss the results.
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 5 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
