
handle: 2183/33270
[Resumen] En este documento se realiza una revisión de las principales técnicas de tratamiento de valores faltantes para aplicarlas en un problema de detección de transacciones fraudulentas. Se realiza un análisis inicial de algunas de las 134 variables iniciales y se realiza un preprocesado de los datos incluyendo una selección de variables, creación de variables e imputación de datos faltantes con distintas técnicas. Los datos son divididos en transacciones de banca electrónica y de banca móvil. Para el primer conjunto de datos los mejores resultados se obtienen con el algoritmo XGBoost aplicando missForest como técnica de imputación; mientras que para el segundo conjunto el mejor modelo es un CatBoost con missForest como técnica de imputación.
[Abstract] In this document, a review of the main techniques for handling missing values is conducted in order to apply them to a fraudulent transaction detection problem. An initial analysis is performed on some of the 134 initial variables, and data preprocessing is carried out, including variable selection, variable creation, and imputation of missing data using different techniques. The data is divided into electronic banking transactions and mobile banking transactions. For the first dataset, the best results are obtained with the XGBoost algorithm using missForest as the imputation technique, while for the second dataset, the best model is a CatBoost with missForest as the imputation technique.
Traballo fin de grao (UDC.FIC). Ciencia e enxeñaría de datos. Curso 2022/2023
Datos faltantes, CatBoost, Data imputation, Missing data, Fraudulent transactions, MissForest, Imputación de datos, Transacciones fraudulentas, XGBoost
Datos faltantes, CatBoost, Data imputation, Missing data, Fraudulent transactions, MissForest, Imputación de datos, Transacciones fraudulentas, XGBoost
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
