
Outlier detection over sliding window is a fundamental problem in the domain of streaming data management, which has been studied over 10 years. The key of supporting outlier detection is to construct a neighbour-list for each object. It is used for predicting which objects may become outliers or are impossible to become outliers. However, existing work ignores the fact that, outliers amount is usually small. It is unnecessary to construct neighbour-list for all objects when they arrive in the window. It causes both high space and computational cost, can not efficiently work under edge computation environment. In this paper, we propose a novel framework named PTAOD (Probabilistic Threshold-based Approximate Outlier Detection). Firstly, we propose an algorithm for evaluating the probability of a newly arrived object becoming an outlier before it expires from the window, using evaluating result for avoiding unnecessary computational cost. In addition, we introduce a novel index namely ZHB-Tree (Z-order-based Hash BTree) to maintain streaming data. Last of all, we propose a novel algorithm to maintain candidate outliers. Theoretical analysis and extensive experimental results demonstrate the effectiveness of the proposed algorithms
index, Information society, Data systems, Distributed computing, Data flow computing, 004 Data processing & computer science, QA75 Electronic computers. Computer science, streaming data, TK1-9971, AI and Technologies, probability guarantee, Outlier detection, Centre for Distributed Computing, Networking and Security, Electrical engineering. Electronics. Nuclear engineering, Networks
index, Information society, Data systems, Distributed computing, Data flow computing, 004 Data processing & computer science, QA75 Electronic computers. Computer science, streaming data, TK1-9971, AI and Technologies, probability guarantee, Outlier detection, Centre for Distributed Computing, Networking and Security, Electrical engineering. Electronics. Nuclear engineering, Networks
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 2 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
