
System traces are a collection of time-stamped messages recorded by the operating system while the system is running. Analysis of these traces is crucial for tasks such as system fault finding. Moreover, detecting anomalies in system behavior becomes crucial in safety-critical and time-sensitive systems where delayed detections can lead to catastrophic outcomes. Therefore, we focus on developing a lightweight and explainable approach for safety-critical time-sensitive systems.Given a set of system traces under normal conditions and anomalous conditions, trace-based anomaly detection aims at classifying the trace as anomalous or not. In this work, we introduce GWAD, a greedy workflow graph framework for anomaly detection, a novel greedy graph construction approach for both offline and online anomaly detection in system traces. Our approach utilizes both sequence of occurrence of events and the time interval between their occurrences in learning the normal system behavior. We propose two approaches, first for offline classification of the trace as anomalous or normal using the event occurrence workflow graphs and secondly an online streaming algorithm that monitors the events as they occur in real-time for detecting anomalies increasing system resilience. Our approach also provides reasoning for the cause of anomalous behavior. We show that GWAD is better than traditional state-of-the-art models. The paper shows the technical feasibility and viability of GWAD through multiple case studies using traces from a field-tested hexacopter.
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 1 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
