The Impact of Balancing Techniques and Feature Selection on Machine Learning Models for Diabetes Detection

Name: The Impact of Balancing Techniques and Feature Selection on Machine Learning Models for Diabetes Detection
Creator: Vahid Sinap
Keywords: Makine Öğrenme (Diğer), Diabetes detection;data balancing techniques;imbalanced datasets;predictive modeling;health informatics, Diyabet tespiti;veri dengeleme teknikleri;dengesiz veri setleri;tahmine dayalı modelleme;sağlık bilişimi, Machine Learning (Other)

Vahid Sinap

Found an issue? Give us feedback

Fırat Üniversitesi M...arrow_drop_down

Fırat Üniversitesi Mühendislik Bilimleri Dergisi

Article . 2025 . Peer-reviewed

Data sources: Crossref

TÜBİTAK ULAKBİM DergiPark

Article . 2024

Data sources: TÜBİTAK ULAKBİM DergiPark

The Impact of Balancing Techniques and Feature Selection on Machine Learning Models for Diabetes Detection

descriptionPublicationkeyboard_double_arrow_right Article 27 Mar 2025Publisher:Firat UniversitesiJournal:Fırat Üniversitesi Mühendislik Bilimleri Dergisi, volume 37, pages 303-320 (issn: 1308-9072,

Copyright policy )

Authors: Vahid Sinap;

doi: 10.35234/fumbd.1556260

The Impact of Balancing Techniques and Feature Selection on Machine Learning Models for Diabetes Detection

- Summary
- Subjects
- Metrics

Abstract

The detection of diabetes is crucial for effective management and prevention of the disease, which poses significant health risks globally. This study introduces a novel approach to diabetes detection by combining advanced data balancing techniques and feature selection methods, including Lasso (L1) regularization, to enhance the performance of predictive models in imbalanced datasets. Techniques such as Random Under Sampling (RUS), Adaptive Synthetic Sampling (ADASYN), and Synthetic Minority Over-sampling Technique (SMOTE) were employed alongside models including Random Forest (RF), CatBoost (CB), Extreme Gradient Boosting (XGB), K-Nearest Neighbors (KNN), Gaussian Naive Bayes (GNB), Logistic Regression (LR), and Gradient Boosting (GB) to assess their impact on model accuracy and generalization capabilities. The findings reveal that the RF model achieved the highest accuracy of 93.25% when utilizing the SMOTE technique, underscoring the importance of appropriate data handling strategies in improving predictive outcomes. Furthermore, when all features were utilized without selection, the RF model attained an accuracy of 95.31%, indicating the model’s capacity to capture complex patterns when feature richness is maximized. The comprehensive methodology used in the study achieved a higher accuracy in diabetes detection than research in the literature and provided important outputs for developing reliable prediction models in healthcare.

Related Organizations

Ufuk University
Turkey

Keywords

Makine Öğrenme (Diğer), Diabetes detection;data balancing techniques;imbalanced datasets;predictive modeling;health informatics, Diyabet tespiti;veri dengeleme teknikleri;dengesiz veri setleri;tahmine dayalı modelleme;sağlık bilişimi, Machine Learning (Other)

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	1
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

1

Average

gold