Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ Fırat Üniversitesi M...arrow_drop_down
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
Fırat Üniversitesi Mühendislik Bilimleri Dergisi
Article . 2025 . Peer-reviewed
Data sources: Crossref
versions View all 2 versions
addClaim

This Research product is the result of merged Research products in OpenAIRE.

You have already added 0 works in your ORCID record related to the merged Research product.

The Impact of Balancing Techniques and Feature Selection on Machine Learning Models for Diabetes Detection

Authors: Vahid Sinap;

The Impact of Balancing Techniques and Feature Selection on Machine Learning Models for Diabetes Detection

Abstract

The detection of diabetes is crucial for effective management and prevention of the disease, which poses significant health risks globally. This study introduces a novel approach to diabetes detection by combining advanced data balancing techniques and feature selection methods, including Lasso (L1) regularization, to enhance the performance of predictive models in imbalanced datasets. Techniques such as Random Under Sampling (RUS), Adaptive Synthetic Sampling (ADASYN), and Synthetic Minority Over-sampling Technique (SMOTE) were employed alongside models including Random Forest (RF), CatBoost (CB), Extreme Gradient Boosting (XGB), K-Nearest Neighbors (KNN), Gaussian Naive Bayes (GNB), Logistic Regression (LR), and Gradient Boosting (GB) to assess their impact on model accuracy and generalization capabilities. The findings reveal that the RF model achieved the highest accuracy of 93.25% when utilizing the SMOTE technique, underscoring the importance of appropriate data handling strategies in improving predictive outcomes. Furthermore, when all features were utilized without selection, the RF model attained an accuracy of 95.31%, indicating the model’s capacity to capture complex patterns when feature richness is maximized. The comprehensive methodology used in the study achieved a higher accuracy in diabetes detection than research in the literature and provided important outputs for developing reliable prediction models in healthcare.

Related Organizations
Keywords

Makine Öğrenme (Diğer), Diabetes detection;data balancing techniques;imbalanced datasets;predictive modeling;health informatics, Diyabet tespiti;veri dengeleme teknikleri;dengesiz veri setleri;tahmine dayalı modelleme;sağlık bilişimi, Machine Learning (Other)

  • BIP!
    Impact byBIP!
    selected citations
    These citations are derived from selected sources.
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    1
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
Powered by OpenAIRE graph
Found an issue? Give us feedback
selected citations
These citations are derived from selected sources.
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
1
Average
Average
Average
gold