
We have employed datasets sourced from LogPAI (Log Parsing and Anomaly Detection), a research-oriented platform dedicated to addressing anomaly detection and log data analysis challenges. To our knowledge, this is the most extensive compilation of log datasets. LogPAI have made every effort to keep the logs in their original, unsanitized, anonymized, and unaltered form. These datasets are openly available for research purposes. These datasets are integral to our log parsing study, encompassing seven distinct datasets originating from various systems, including distributed systems (e.g., HDFS, Spark), supercomputers (Thunderbird), operating systems (e.g., Windows), mobile systems (e.g., Android), server applications (e.g., Apache), and standalone software (e.g., Proxifier). Notably, each of these datasets comprises precisely 2000 manually labeled log messages which server as a groundtruth in our study which will be utilized to calculate accuracy of the model.
Loghub is a repository of system logs that are publicly available for AI-based log analysis research. Some of the logs are real data from previous studies, while some others are obtained from actual systems in our lab setting. The logs are kept as original as possible without any sanitization, anonymization or modification.
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
