A simple method for unsupervised anomaly detection: An application to Web time series data

descriptionPublicationkeyboard_double_arrow_right Article , Other literature type 01 Jan 2021 English Publisher:Public Library of Science (PLoS)Journal:PLOS ONE, volume 17, page e0262463 (eissn: 1932-6203,

Copyright policy )

Authors: Keisuke Yoshihara; Kei Takahashi;

doi: 10.1371/journal.pone.0262463 , 10.2139/ssrn.3871018

pmid: 35015791

pmc: PMC8752013

A simple method for unsupervised anomaly detection: An application to Web time series data

- Summary
- Subjects
- Related research
  (1)
- Metrics

Abstract

We propose a simple anomaly detection method that is applicable to unlabeled time series data and is sufficiently tractable, even for non-technical entities, by using the density ratio estimation based on the state space model. Our detection rule is based on the ratio of log-likelihoods estimated by the dynamic linear model, i.e. the ratio of log-likelihood in our model to that in an over-dispersed model that we will call the NULL model. Using the Yahoo S5 data set and the Numenta Anomaly Benchmark data set, publicly available and commonly used benchmark data sets, we find that our method achieves better or comparable performance compared to the existing methods. The result implies that it is essential in time series anomaly detection to incorporate the specific information on time series data into the model. In addition, we apply the proposed method to unlabeled Web time series data, specifically, daily page view and average session duration data on an electronic commerce site that deals in insurance goods to show the applicability of our method to unlabeled real-world data. We find that the increase in page view caused by e-mail newsletter deliveries is less likely to contribute to completing an insurance contract. The result also suggests the importance of the simultaneous monitoring of more than one time series.

Related Organizations

The Institute of Statistical Mathematics
Japan
Fukuoka Institute of Technology
Japan
Northwestern Polytechnical University
China (People's Republic of)
Gunma University
Japan

Keywords

Internet, Time Factors, Science, Q, Statistics as Topic, R, Datasets as Topic, Search Engine, Medicine, Humans, Neural Networks, Computer, Algorithms, Research Article

1 Research products, page 1 of 1

NAB software on GitHub
IsRelatedTo

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	12
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Top 10%
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Top 10%
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Top 10%