Name: Impact of lightGBM hyperparameters on class imbalance
Creator: Caballero Castro, Joan
Keywords: Maskininlärning, 330, Computer Sciences, Optimització d’hiperparàmetres, Optimització d'hiperparàmetres, Àrees temàtiques de la UPC::Informàtica::Intel·ligència artificial::Aprenentatge automàtic, Predicción de clase minoritaria, Förutsägelse av minoritetsklass, Hyperparameteroptimering, Aprendizaje automático

descriptionPublicationkeyboard_double_arrow_right Bachelor thesis 01 Jan 2024 English Publisher:Universitat Politècnica de Catalunya

Authors: Caballero Castro, Joan;

handle: 2117/418471

Impact of lightGBM hyperparameters on class imbalance

- Summary
- Subjects
- Metrics

Abstract

Class imbalance is a common problem in Machine Learning (ML) that introduces bias during the training phase of ML models, compromising their accuracy and reliability. This problem is particularly critical in fields such as disease diagnosis and credit risk assessment, where it is crucial to accurately predict the minority class. Despite extensive research on class imbalance, its treatment in the LightGBM model, especially through hyperparameter optimisation, remains underexplored. This thesis investigates the influence of hyperparameters on handling class imbalance in the LightGBM model. The main objectives are to identify which hyperparameters most significantly affect class imbalance and to determine whether hyperparameter optimisation can overcome this problem. A series of experiments were conducted to assess the individual impact of various LightGBM hyperparameters on class imbalance. We trained multiple configurations of LightGBM models, each varying only one hyperparameter while keeping all others at their default values. These configurations were evaluated using key performance metrics such as AUC, recall, and F1 score to determine their efficacy in predicting the minority class. This approach identified the hyperparameters that most significantly affect class imbalance. Furthermore, we conducted a second study employing Bayesian optimisation to find the optimal combination of hyperparameters. This optimal combination was then compared against results from similar studies to evaluate its effectiveness in overcoming class imbalance. Our findings identified is_unbalance and max_depth as the hyperparameters that most significantly influence LightGBM's performance on class-imbalanced datasets. Setting is_unbalance incorrectly results in LightGBM identifying only 1.6% of minority class instances, whereas setting it correctly enables LightGBM to identify up to 70% of such instances. Additionally, our research concludes that hyperparameter optimisation significantly enhances LightGBM's ability to detect the minority class compared to using default hyperparameters. This demonstrates that optimising hyperparameters is crucial for effectively addressing class imbalance.

Related Organizations

Royal Institute of Technology
Sweden
Universitat Politècnica de Catalunya
Spain
Universitat Polite`cnica de Catalunya
Spain

Keywords

Maskininlärning, 330, Computer Sciences, Optimització d’hiperparàmetres, Optimització d'hiperparàmetres, Àrees temàtiques de la UPC::Informàtica::Intel·ligència artificial::Aprenentatge automàtic, Predicción de clase minoritaria, Förutsägelse av minoritetsklass, Hyperparameteroptimering, Aprendizaje automático, LightGBM, Desequilibri de classes, Class Imbalance, Machine Learning, Datavetenskap (datalogi), Desequilibrio de clases, Optimización de hiperparámetros, Machine learning, Aprenentatge automàtic, Classes socials, Social classes, Predicció de classe minoritària, Klassobalans, Hyperparameter Optimisation, Minority Class Prediction

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average