Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ Online-Publikations-...arrow_drop_down
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
https://dx.doi.org/10.25972/op...
Doctoral thesis . 2022
License: CC BY NC SA
Data sources: Datacite
DBLP
Doctoral thesis
Data sources: DBLP
versions View all 3 versions
addClaim

Detecting Anomalies in Transaction Data

Authors: Schlör, Daniel;

Detecting Anomalies in Transaction Data

Abstract

Detecting anomalies in transaction data is an important task with a high potential to avoid financial loss due to irregularities deliberately or inadvertently carried out, such as credit card fraud, occupational fraud in companies or ordering and accounting errors. With ongoing digitization of our world, data-driven approaches, including machine learning, can draw benefit from data with less manual effort and feature engineering. A large variety of machine learning-based anomaly detection methods approach this by learning a precise model of normality from which anomalies can be distinguished. Modeling normality in transactional data, however, requires to capture distributions and dependencies within the data precisely with special attention to numerical dependencies such as quantities, prices or amounts. To implicitly model numerical dependencies, Neural Arithmetic Logic Units have been proposed as neural architecture. In practice, however, these have stability and precision issues. Therefore, we first develop an improved neural network architecture, iNALU, which is designed to better model numerical dependencies as found in transaction data. We compare this architecture to the previous approach and show in several experiments of varying complexity that our novel architecture provides better precision and stability. We integrate this architecture into two generative neural network models adapted for transaction data and investigate how well normal behavior is modeled. We show that both architectures can successfully model normal transaction data, with our neural architecture improving generative performance for one model. Since categorical and numerical variables are common in transaction data, but many machine learning methods only process numerical representations, we explore different representation learning techniques to transform categorical transaction data into dense numerical vectors. We extend this approach by proposing an outlier-aware discretization, thus incorporating numerical attributes into the computation of categorical embeddings, and investigate latent spaces, as well as quantitative performance for anomaly detection. Next, we evaluate different scenarios for anomaly detection on transaction data. We extend our iNALU architecture to a neural layer that can model both numerical and non-numerical dependencies and evaluate it in a supervised and one-class setting. We investigate the stability and generalizability of our approach and show that it outperforms a variety of models in the balanced supervised setting and performs comparably in the one-class setting. Finally, we evaluate three approaches to using a generative model as an anomaly detector and compare the anomaly detection performance.

Die Erkennung von Anomalien in Transaktionsdaten ist eine wichtige Zielsetzung mit hohem Potenzial, finanzielle Verluste zu vermeiden, die auf absichtlich oder versehentlich begangenen Unregelmäßigkeiten wie beispielsweise Kreditkartenbetrug oder Bestell- und Abrechnungsfehlern gründen. Mit der fortschreitenden Digitalisierung können datengetriebene Ansätze einschließlich maschinellen Lernens mit immer weniger manuellem Aufwand Nutzen aus den Daten ziehen. Viele Methoden zur Erkennung von Anomalien, die auf maschinellem Lernen basieren, verfolgen diesen Ansatz, indem sie ein präzises Modell der normalen Daten erlernen, mit dem sich dann Anomalien davon unterscheiden lassen. Die Modellierung von normalen Transaktionsdaten erfordert jedoch eine genaue Erfassung von Verteilungen und Abhängigkeiten innerhalb der Daten mit besonderem Augenmerk auf numerischen Abhängigkeiten von beispielsweise Mengen oder Geldbeträgen. Zur impliziten Modellierung numerischer Abhängigkeiten wurden Neural Arithmetic Logic Units als neuronale Architektur vorgeschlagen. In der Praxis haben diese jedoch Stabilitäts- und Präzisionsprobleme. Daher entwickeln wir zunächst eine verbesserte neuronale Netzwerkarchitektur, iNALU, die darauf ausgelegt ist, numerische Abhängigkeiten, wie sie in Transaktionsdaten vorkommen, besser zu modellieren. Wir vergleichen diese Architektur mit ihrer Vorläuferarchitektur und zeigen in mehreren Experimenten, dass unsere Architektur höhere Präzision und Stabilität bietet. Wir integrieren unsere Architektur in zwei generative neuronale Netzmodelle, die für Transaktionsdaten angepasst wurden, und untersuchen, wie gut Normalverhalten modelliert wird. Wir zeigen, dass beide Architekturen normale Daten erfolgreich modellieren können, wobei die in dieser Arbeit vorgestellte neuronale Architektur die generativen Ergebnisse für ein Modell verbessert. Da kategorische und numerische Variablen in Transaktionsdaten häufig zusammen vorkommen, viele Methoden des maschinellen Lernens jedoch nur numerische Repräsentationen verarbeiten, untersuchen wir verschiedene Techniken des Repräsentationslernens, um kategorische Transaktionsdaten in dichte numerische Vektoren zu transformieren. Wir erweitern diese, indem wir einen Diskretisierungsansatz vorschlagen, der Ausreißer berücksichtigt. Damit werden Zusammenhänge numerischer Datentypen in die Berechnung kategorischer Einbettungen einbezogen, um die Anomalieerkennung insgesamt zu verbessern.

Country
Germany
Related Organizations
Keywords

ddc:000, Anomalieerkennung, 000 Informatik, Informationswissenschaft, allgemeine Werke

  • BIP!
    Impact byBIP!
    selected citations
    These citations are derived from selected sources.
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    0
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
Powered by OpenAIRE graph
Found an issue? Give us feedback
selected citations
These citations are derived from selected sources.
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
0
Average
Average
Average
Green