Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ UPCommons. Portal de...arrow_drop_down
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
Publikationer från KTH
Bachelor thesis . 2024
versions View all 3 versions
addClaim

This Research product is the result of merged Research products in OpenAIRE.

You have already added 0 works in your ORCID record related to the merged Research product.

Impact of lightGBM hyperparameters on class imbalance

Authors: Caballero Castro, Joan;

Impact of lightGBM hyperparameters on class imbalance

Abstract

Class imbalance is a common problem in Machine Learning (ML) that introduces bias during the training phase of ML models, compromising their accuracy and reliability. This problem is particularly critical in fields such as disease diagnosis and credit risk assessment, where it is crucial to accurately predict the minority class. Despite extensive research on class imbalance, its treatment in the LightGBM model, especially through hyperparameter optimisation, remains underexplored. This thesis investigates the influence of hyperparameters on handling class imbalance in the LightGBM model. The main objectives are to identify which hyperparameters most significantly affect class imbalance and to determine whether hyperparameter optimisation can overcome this problem. A series of experiments were conducted to assess the individual impact of various LightGBM hyperparameters on class imbalance. We trained multiple configurations of LightGBM models, each varying only one hyperparameter while keeping all others at their default values. These configurations were evaluated using key performance metrics such as AUC, recall, and F1 score to determine their efficacy in predicting the minority class. This approach identified the hyperparameters that most significantly affect class imbalance. Furthermore, we conducted a second study employing Bayesian optimisation to find the optimal combination of hyperparameters. This optimal combination was then compared against results from similar studies to evaluate its effectiveness in overcoming class imbalance. Our findings identified is_unbalance and max_depth as the hyperparameters that most significantly influence LightGBM's performance on class-imbalanced datasets. Setting is_unbalance incorrectly results in LightGBM identifying only 1.6% of minority class instances, whereas setting it correctly enables LightGBM to identify up to 70% of such instances. Additionally, our research concludes that hyperparameter optimisation significantly enhances LightGBM's ability to detect the minority class compared to using default hyperparameters. This demonstrates that optimising hyperparameters is crucial for effectively addressing class imbalance.

Keywords

Maskininlärning, 330, Computer Sciences, Optimització d’hiperparàmetres, Optimització d'hiperparàmetres, Àrees temàtiques de la UPC::Informàtica::Intel·ligència artificial::Aprenentatge automàtic, Predicción de clase minoritaria, Förutsägelse av minoritetsklass, Hyperparameteroptimering, Aprendizaje automático, LightGBM, Desequilibri de classes, Class Imbalance, Machine Learning, Datavetenskap (datalogi), Desequilibrio de clases, Optimización de hiperparámetros, Machine learning, Aprenentatge automàtic, Classes socials, Social classes, Predicció de classe minoritària, Klassobalans, Hyperparameter Optimisation, Minority Class Prediction

  • BIP!
    Impact byBIP!
    selected citations
    These citations are derived from selected sources.
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    0
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
    OpenAIRE UsageCounts
    Usage byUsageCounts
    visibility views 122
    download downloads 132
  • 122
    views
    132
    downloads
    Powered byOpenAIRE UsageCounts
Powered by OpenAIRE graph
Found an issue? Give us feedback
visibility
download
selected citations
These citations are derived from selected sources.
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
views
OpenAIRE UsageCountsViews provided by UsageCounts
downloads
OpenAIRE UsageCountsDownloads provided by UsageCounts
0
Average
Average
Average
122
132
Green
Related to Research communities