Powered by OpenAIRE graph
Found an issue? Give us feedback
addClaim

This Research product is the result of merged Research products in OpenAIRE.

You have already added 0 works in your ORCID record related to the merged Research product.

The Multicollinearity Effect on the Performance of Machine Learning Algorithms: Case Examples in Healthcare Modelling

Authors: Hasan Yıldırım;

The Multicollinearity Effect on the Performance of Machine Learning Algorithms: Case Examples in Healthcare Modelling

Abstract

Background: The data extracted from various fields inherently consists of extremely correlated measurements in parallel with the exponential increase in the size of the data that need to be interpreted owing to the technological advances. This problem, called the multicollinearity, influences the performance of both statistical and machine learning algorithms. Statistical models proposed as a potential remedy to this problem have not been sufficiently evaluated in the literature. Therefore, a comprehensive comparison of statistical and machine learning models is required for addressing the multicollinearity problem. Methods: Statistical models (including Ridge, Liu, Lasso and Elastic Net regression) and the eight most important machine learning algorithms (including Cart, Knn, Mlp, MARS, Cubist, Svm, Bagging and XGBoost) are comprehensively compared by using two different healthcare datasets (including Body Fat and Cancer) having multicollinearity problem. The performance of the models is assessed through cross validation methods via root mean square error, mean absolute error and r-squared criteria. Results: The results of the study revealed that statistical models outperformed machine learning models in terms of root mean square error, mean absolute error and r-squared criteria in both training and testing performance. Particularly the Liu regression often achieved better relative performance (up to 7.60% to 46.08% for Body Fat data set and up to 1.55% to 21.53% for Cancer data set on training performance and up to 1.56% to 38.08% for Body Fat data set and up to 3.50% to 23.29% for Cancer data set on testing performance) among regression methods as well as compared to machine algorithms. Conclusions: Liu regression is mostly disregarded in the machine learning literature, but since it outperforms the most powerful and widely used machine learning algorithms, it appears to be a promising tool in almost all fields, especially for regression-based studies including data with multicollinearity problem.

Related Organizations
Keywords

Makine Öğrenme (Diğer), Machine Learning Algorithms, Denetimli Öğrenme, machine learning;multicollinearity;feature selection;artificial intelligence;collinearity, Supervised Learning, Makine Öğrenmesi Algoritmaları, Machine Learning (Other)

  • BIP!
    Impact byBIP!
    selected citations
    These citations are derived from selected sources.
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    13
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Top 10%
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Top 10%
Powered by OpenAIRE graph
Found an issue? Give us feedback
selected citations
These citations are derived from selected sources.
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
13
Top 10%
Average
Top 10%
Related to Research communities
Cancer Research
Upload OA version
Are you the author of this publication? Upload your Open Access version to Zenodo!
It’s fast and easy, just two clicks!