
Anomaly detection in real-time data is accepted as a vital area of research. Clustering techniques have effectively been applied for the detection of anomalies several times. As the datasets are real time, the time of data generation is important. Most of the existing clustering-based methods either follow a partitioning or a hierarchical approach without addressing time attributes of the dataset distinctly. In this article, a mixed clustering approach is introduced for this purpose, which also takes time attributes into consideration. It is a two-phase method that first follows a partitioning approach, then an agglomerative hierarchical approach. The dataset can have mixed attributes. In phase one, a unified metric is used that is defined based on mixed attributes. The same metric is also used for merging similar clusters in phase two. Tracking of the time stamp associated with each data instance is conducted simultaneously, producing clusters with different lifetimes in phase one. Then, in phase two, the similar clusters are merged along with their lifetimes. While merging the similar clusters, the lifetimes of the corresponding clusters with overlapping cores are merged using superimposition operation, producing a fuzzy time interval. This way, each cluster will have an associated fuzzy lifetime. The data instances either belonging to sparse clusters, not belonging to any of the clusters or falling in the fuzzy lifetimes with low membership values can be treated as anomalies. The efficacy of the algorithms can be established using both complexity analysis as well as experimental studies. The experimental results with a real world dataset and a synthetic dataset show that the proposed algorithm can detect the anomalies with 90% and 98% accuracy, respectively.
merge function, Technology, agglomerative hierarchical algorithm, QH301-705.5, T, Physics, QC1-999, data instances; real-time systems; <i>k</i>-means algorithm; agglomerative hierarchical algorithm; similarity measure; merge function, Engineering (General). Civil engineering (General), similarity measure, Chemistry, real-time systems, artificial_intelligence_robotics, data instances, <i>k</i>-means algorithm, TA1-2040, Biology (General), QD1-999
merge function, Technology, agglomerative hierarchical algorithm, QH301-705.5, T, Physics, QC1-999, data instances; real-time systems; <i>k</i>-means algorithm; agglomerative hierarchical algorithm; similarity measure; merge function, Engineering (General). Civil engineering (General), similarity measure, Chemistry, real-time systems, artificial_intelligence_robotics, data instances, <i>k</i>-means algorithm, TA1-2040, Biology (General), QD1-999
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 10 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Top 10% | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Top 10% |
