
The detection of diabetes is crucial for effective management and prevention of the disease, which poses significant health risks globally. This study introduces a novel approach to diabetes detection by combining advanced data balancing techniques and feature selection methods, including Lasso (L1) regularization, to enhance the performance of predictive models in imbalanced datasets. Techniques such as Random Under Sampling (RUS), Adaptive Synthetic Sampling (ADASYN), and Synthetic Minority Over-sampling Technique (SMOTE) were employed alongside models including Random Forest (RF), CatBoost (CB), Extreme Gradient Boosting (XGB), K-Nearest Neighbors (KNN), Gaussian Naive Bayes (GNB), Logistic Regression (LR), and Gradient Boosting (GB) to assess their impact on model accuracy and generalization capabilities. The findings reveal that the RF model achieved the highest accuracy of 93.25% when utilizing the SMOTE technique, underscoring the importance of appropriate data handling strategies in improving predictive outcomes. Furthermore, when all features were utilized without selection, the RF model attained an accuracy of 95.31%, indicating the model’s capacity to capture complex patterns when feature richness is maximized. The comprehensive methodology used in the study achieved a higher accuracy in diabetes detection than research in the literature and provided important outputs for developing reliable prediction models in healthcare.
Makine Öğrenme (Diğer), Diabetes detection;data balancing techniques;imbalanced datasets;predictive modeling;health informatics, Diyabet tespiti;veri dengeleme teknikleri;dengesiz veri setleri;tahmine dayalı modelleme;sağlık bilişimi, Machine Learning (Other)
Makine Öğrenme (Diğer), Diabetes detection;data balancing techniques;imbalanced datasets;predictive modeling;health informatics, Diyabet tespiti;veri dengeleme teknikleri;dengesiz veri setleri;tahmine dayalı modelleme;sağlık bilişimi, Machine Learning (Other)
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 1 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
