
Financial institutions require an accurate estimation of the risk of loan default in order to reduce losses incurred by creditand sustain lending. This study proposes a robust stacking-based machine learning framework that integrates Knowledge GraphEmbedding (KGE) for semantic feature enrichment with XGBoost as the final predictive model. The approach is evaluated on theHome Credit Default Risk (HCDR) dataset, comprising diverse financial, demographic, and behavioral attributes of loan applicants.A comprehensive preprocessing pipeline, including imputation, normalization, one-hot encoding, and correlation-based featureselection, ensures data quality and model generalizability. The proposed KGE-XGBoost model captures both structured tabular andrelational semantics by transforming borrower-entity relationships into dense embeddings, which are concatenated with originalfeatures to form a unified representation. Experimental results demonstrate superior performance with 96.79% accuracy (ACC),80.83% precision (PRE), 78.75% recall (REC), and an F1-score (F1) of 79.00%. The proposed model exhibits a strong ability tooutperform the baseline models (Random Forest achieved ACC 94.20%, NN achieved ACC 89%, and DT achieved ACC 73%),particularly in scenarios with class imbalances. The KGE integration has been found to greatly contribute to feature expressivenessand it presents a scalable and promising credit risk assessment solution to real-life financial applications.
Machine Learning, Knowledge Graph Embedding (KGE), Classification Models, Credit Risk Assessment, Loan Default Prediction, Feature Enrichment, XGBoost
Machine Learning, Knowledge Graph Embedding (KGE), Classification Models, Credit Risk Assessment, Loan Default Prediction, Feature Enrichment, XGBoost
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
