Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ RUC. Repositorio da ...arrow_drop_down
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
image/svg+xml Jakob Voss, based on art designer at PLoS, modified by Wikipedia users Nina and Beao Closed Access logo, derived from PLoS Open Access logo. This version with transparent background. http://commons.wikimedia.org/wiki/File:Closed_Access_logo_transparent.svg Jakob Voss, based on art designer at PLoS, modified by Wikipedia users Nina and Beao
Recolector de Ciencia Abierta, RECOLECTA
Bachelor thesis . 2023
License: CC BY NC ND
versions View all 4 versions
addClaim

Técnicas de tratamiento de datos faltantes y aplicación en problema de detección de fraude bancario

Authors: Zas Pérez, Alexandre;

Técnicas de tratamiento de datos faltantes y aplicación en problema de detección de fraude bancario

Abstract

[Resumen] En este documento se realiza una revisión de las principales técnicas de tratamiento de valores faltantes para aplicarlas en un problema de detección de transacciones fraudulentas. Se realiza un análisis inicial de algunas de las 134 variables iniciales y se realiza un preprocesado de los datos incluyendo una selección de variables, creación de variables e imputación de datos faltantes con distintas técnicas. Los datos son divididos en transacciones de banca electrónica y de banca móvil. Para el primer conjunto de datos los mejores resultados se obtienen con el algoritmo XGBoost aplicando missForest como técnica de imputación; mientras que para el segundo conjunto el mejor modelo es un CatBoost con missForest como técnica de imputación.

[Abstract] In this document, a review of the main techniques for handling missing values is conducted in order to apply them to a fraudulent transaction detection problem. An initial analysis is performed on some of the 134 initial variables, and data preprocessing is carried out, including variable selection, variable creation, and imputation of missing data using different techniques. The data is divided into electronic banking transactions and mobile banking transactions. For the first dataset, the best results are obtained with the XGBoost algorithm using missForest as the imputation technique, while for the second dataset, the best model is a CatBoost with missForest as the imputation technique.

Traballo fin de grao (UDC.FIC). Ciencia e enxeñaría de datos. Curso 2022/2023

Country
Spain
Related Organizations
Keywords

Datos faltantes, CatBoost, Data imputation, Missing data, Fraudulent transactions, MissForest, Imputación de datos, Transacciones fraudulentas, XGBoost

  • BIP!
    Impact byBIP!
    selected citations
    These citations are derived from selected sources.
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    0
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
Powered by OpenAIRE graph
Found an issue? Give us feedback
selected citations
These citations are derived from selected sources.
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
0
Average
Average
Average
Green