
doi: 10.2307/2347628
Summary: In this paper it is shown how ridge estimators can be used in logistic regression to improve the parameter estimates and to diminish the error made by further predictions. Different ways to choose the unknown ridge parameter are discussed. The main attention focuses on ridge parameters obtained by cross-validation. Three different ways to define the prediction error are considered: classification error, squared error and minus log-likelihood. The use of ridge regression is illustrated by developing a prognostic index for the two-year survival probability of patients with ovarian cancer as a function of their deoxyribonucleic acid (DNA) histrogram. In this example, the number of covariates is large compared with the number of observations and modelling without restrictions on the parameters leads to overfitting. Defining a restriction on the parameters, such that neighbouring intervals in the DNA histogram differ only slightly in their influence on the survival, yields ridge-type parameter estimates with reasonable values which can be clinically interpreted. Furthermore the model can predict new observations more accurately.
Ridge regression; shrinkage estimators (Lasso), predictive value, logistic regression, ridge regression, deoxyribonucleic acid histogram, cross-validation, Applications of statistics to biology and medical sciences; meta analysis
Ridge regression; shrinkage estimators (Lasso), predictive value, logistic regression, ridge regression, deoxyribonucleic acid histogram, cross-validation, Applications of statistics to biology and medical sciences; meta analysis
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 1K | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Top 0.1% | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Top 0.1% | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Top 10% |
