Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ ZENODOarrow_drop_down
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Article . 2025
Data sources: ZENODO
ZENODO
Article . 2025
Data sources: Datacite
ZENODO
Article . 2025
Data sources: Datacite
versions View all 2 versions
addClaim

Leveraging Network Topology for Credit Risk Assessment in P2P Lending: A Comparative Study under the Lens of Machine Learning

Authors: Baals, Lennart John;

Leveraging Network Topology for Credit Risk Assessment in P2P Lending: A Comparative Study under the Lens of Machine Learning

Abstract

OverviewThis deposit provides the materials required to reproduce the empirical workflow, figures, and manuscript source for the study: “Leveraging Network Topology for Credit Risk Assessment in P2P Lending: A Comparative Study under the Lens of Machine Learning.” The study forms Chapter 3 of the doctoral dissertation “Risk Management in Digital Finance: Assessment and Pricing in an Emerging Fintech Era” by Lennart John Baals and is published as: Liu, Y., Baals, L. J., Osterrieder, J., & Hadji-Misheva, B. (2024). Leveraging network topology for credit risk assessment in P2P lending: A comparative study under the lens of machine learning. Expert Systems with Applications, 252, 124100. This deposit contains (i) the LaTeX source of the thesis chapter, (ii) Jupyter notebooks implementing data preprocessing, network construction, model training/evaluation, and explainability analyses, and (iii) figure outputs summarizing descriptive statistics, feature importance, ROC performance, and network-centrality characteristics. Contents of this deposit (file-level summary) Manuscript / thesis chapter source: main_WP3_PhD_Lennart_Baals.tex, bibliography files (e.g., reference_paper_1.bib), and formatting assets (e.g., apa.bst). Jupyter notebooks (analysis pipeline): Preprocessing & feature engineering: 0.1_data_preprocessing.ipynb, 0.2_descriptive_statistics.ipynb Model training and evaluation workflow: 1_2023.01.05 Data_Pre-processing_&_Models_training.ipynb, 2_2023.07.05_Models_analysis.ipynb, 3_2023.06.28 Model Re-training and testing.ipynb, 4_2023.07.05_Models_analysis.ipynb Explainability: 4_2023.06.28 SHAP Explainability.ipynb Additional experimentation / automation: 2023.04.22 SNF P2P Credit Risk Auto ML.ipynb Key figures / outputs (PDF): Descriptive statistics of raw variables (e.g., interest rate, loan amount, borrower characteristics, prior-loan measures): descriptive_stats_raw_data_full (...).pdf Network-centrality descriptive statistics: descriptive_stats_pagerank.pdf, descriptive_stats_betweenness.pdf, descriptive_stats_closeness.pdf, descriptive_stats_katz.pdf, descriptive_stats_authority.pdf, descriptive_stats_hub.pdf Model performance summaries: all_model_roc_curves.pdf Feature importance / model diagnostics: rf_feature_importance.pdf, glm_feature_importance.pdf, dl_feature_importance.pdf, plus best-model summaries such as RF (best_result).pdf, GLM (best_result).pdf, DL (best_result).pdf Methodological summary (what the code produces)The workflow implements a network-enhanced credit risk assessment framework for P2P lending. In brief, the analysis: constructs a borrower/loan similarity graph using origination-time information and derives network representations, extracts multiple centrality measures (e.g., PageRank, betweenness, closeness, Katz, hub/authority) as additional predictors that encode structural information about similarity-based borrower position, trains and compares several machine-learning models for default prediction (including linear baselines and non-linear learners), and evaluates predictive performance using standard classification metrics and ROC curves, complemented by feature-importance and SHAP-based explainability analyses. The included outputs summarize both the distributional characteristics of the raw data and the incremental predictive value of network-topology features across model classes. Data sources and access conditionsThe empirical component relies on loan-level P2P lending data (e.g., platform data such as Bondora and/or comparable sources, depending on the chapter configuration). Redistribution may be restricted by data-provider terms and privacy constraints. This deposit therefore emphasizes code, documentation, and figure outputs. Users intending to fully reproduce all results should obtain the underlying raw data from the original provider(s) under their own access rights and then apply the provided preprocessing and variable mapping steps as documented in the notebooks. Any included data descriptions are intended to facilitate transparent replication while respecting the applicable redistribution constraints. Reproducibility (how to run)A typical reproduction path is: Run the preprocessing notebooks (0.1_data_preprocessing.ipynb, 0.2_descriptive_statistics.ipynb) to generate cleaned features and descriptive tables. Execute the training/evaluation notebooks (1_..., 2_..., 3_..., 4_...) to reproduce model estimation, ROC curves, and feature-importance outputs. Run the explainability notebook (4_2023.06.28 SHAP Explainability.ipynb) to reproduce SHAP summaries and interpretability results. Compile the thesis chapter from main_WP3_PhD_Lennart_Baals.tex (using the included bibliography/style assets) if you wish to regenerate the manuscript PDF. Intended useThis deposit is intended for: replication of the published results (subject to data access constraints), reuse of the similarity-graph + centrality-feature construction approach for other P2P or retail-credit datasets, and benchmarking of network-enhanced models against conventional credit-scoring baselines. Licensing and reuseUnless otherwise noted within individual files, the intent is to enable reuse for academic and non-commercial research with appropriate attribution. If different licenses apply to code vs. manuscript text/figures, this should be reflected in the record license choice.

Related Organizations
Keywords

Machine Learning, Fintech, P2P Lending, Credit Risk

  • BIP!
    Impact byBIP!
    selected citations
    These citations are derived from selected sources.
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    0
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
Powered by OpenAIRE graph
Found an issue? Give us feedback
selected citations
These citations are derived from selected sources.
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
0
Average
Average
Average
Green
Related to Research communities