Downloads provided by UsageCounts
handle: 2117/418471
Class imbalance is a common problem in Machine Learning (ML) that introduces bias during the training phase of ML models, compromising their accuracy and reliability. This problem is particularly critical in fields such as disease diagnosis and credit risk assessment, where it is crucial to accurately predict the minority class. Despite extensive research on class imbalance, its treatment in the LightGBM model, especially through hyperparameter optimisation, remains underexplored. This thesis investigates the influence of hyperparameters on handling class imbalance in the LightGBM model. The main objectives are to identify which hyperparameters most significantly affect class imbalance and to determine whether hyperparameter optimisation can overcome this problem. A series of experiments were conducted to assess the individual impact of various LightGBM hyperparameters on class imbalance. We trained multiple configurations of LightGBM models, each varying only one hyperparameter while keeping all others at their default values. These configurations were evaluated using key performance metrics such as AUC, recall, and F1 score to determine their efficacy in predicting the minority class. This approach identified the hyperparameters that most significantly affect class imbalance. Furthermore, we conducted a second study employing Bayesian optimisation to find the optimal combination of hyperparameters. This optimal combination was then compared against results from similar studies to evaluate its effectiveness in overcoming class imbalance. Our findings identified is_unbalance and max_depth as the hyperparameters that most significantly influence LightGBM's performance on class-imbalanced datasets. Setting is_unbalance incorrectly results in LightGBM identifying only 1.6% of minority class instances, whereas setting it correctly enables LightGBM to identify up to 70% of such instances. Additionally, our research concludes that hyperparameter optimisation significantly enhances LightGBM's ability to detect the minority class compared to using default hyperparameters. This demonstrates that optimising hyperparameters is crucial for effectively addressing class imbalance.
Maskininlärning, 330, Computer Sciences, Optimització d’hiperparàmetres, Optimització d'hiperparàmetres, Àrees temàtiques de la UPC::Informàtica::Intel·ligència artificial::Aprenentatge automàtic, Predicción de clase minoritaria, Förutsägelse av minoritetsklass, Hyperparameteroptimering, Aprendizaje automático, LightGBM, Desequilibri de classes, Class Imbalance, Machine Learning, Datavetenskap (datalogi), Desequilibrio de clases, Optimización de hiperparámetros, Machine learning, Aprenentatge automàtic, Classes socials, Social classes, Predicció de classe minoritària, Klassobalans, Hyperparameter Optimisation, Minority Class Prediction
Maskininlärning, 330, Computer Sciences, Optimització d’hiperparàmetres, Optimització d'hiperparàmetres, Àrees temàtiques de la UPC::Informàtica::Intel·ligència artificial::Aprenentatge automàtic, Predicción de clase minoritaria, Förutsägelse av minoritetsklass, Hyperparameteroptimering, Aprendizaje automático, LightGBM, Desequilibri de classes, Class Imbalance, Machine Learning, Datavetenskap (datalogi), Desequilibrio de clases, Optimización de hiperparámetros, Machine learning, Aprenentatge automàtic, Classes socials, Social classes, Predicció de classe minoritària, Klassobalans, Hyperparameter Optimisation, Minority Class Prediction
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
| views | 122 | |
| downloads | 132 |

Views provided by UsageCounts
Downloads provided by UsageCounts