Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ ZENODOarrow_drop_down
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Dataset . 2018
License: CC BY
Data sources: Datacite
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Dataset . 2018
License: CC BY
Data sources: Datacite
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Dataset . 2018
License: CC BY
Data sources: Datacite
image/svg+xml Jakob Voss, based on art designer at PLoS, modified by Wikipedia users Nina and Beao Closed Access logo, derived from PLoS Open Access logo. This version with transparent background. http://commons.wikimedia.org/wiki/File:Closed_Access_logo_transparent.svg Jakob Voss, based on art designer at PLoS, modified by Wikipedia users Nina and Beao
ZENODO
Dataset . 2018
Data sources: Datacite
versions View all 4 versions
addClaim

This Research product is the result of merged Research products in OpenAIRE.

You have already added 0 works in your ORCID record related to the merged Research product.

Antarex HPC Fault Dataset

Authors: Netti, Alessio; Kiziltan, Zeynep; Ozalp Babaoglu; Sirbu, Alina; Bartolini, Andrea; Borghesi, Andrea;

Antarex HPC Fault Dataset

Abstract

The Antarex dataset contains trace data collected from the homonymous experimental HPC system located at ETH Zurich while it was subjected to fault injection, for the purpose of conducting machine learning-based fault detection studies for HPC systems. Acquiring our own dataset was made necessary by the fact that commercial HPC system operators are very reluctant to share trace data containing information about faults in their systems. In order to acquire data, we executed benchmark applications and at the same time injected faults in the system at specific times via dedicated programs, so as to trigger anomalies in the behaviour of the applications. A wide range of faults is covered in our dataset, from hardware faults, to misconfiguration faults, and finally to performance anomalies cause by interference from other processes. This was achieved through the FINJ fault injection tool, developed by the authors. The dataset contains two types of data: one type of data refers to a series of CSV files, each containing a set of system performance metrics sampled through the LDMS HPC monitoring framework. Another type refers to the log files detailing the status of the system (i.e., currently running benchmark applications or injected fault programs) at each time point in the dataset. Such a structure enables researchers to perform a wide range of studies on the dataset. Moreover, since we collected the dataset by streaming continuous data, any study based on it will easily be reproducible on a real HPC system, in an online way. The dataset is divided in two parts: the first includes only the CPU and memory-related benchmark applications and fault programs, while the second is strictly hard drive-related. We executed each part in both single-core and multi-core variants, resulting in a total of 4 dataset blocks for 32 days of data acquisition, and 20GB of uncompressed data. For a detailed analysis on the structure and features of the Antarex dataset, please refer to the research paper "Online Fault Classification in HPC System through Machine Learning", by Netti et al. Additional details can be found in the research paper "FINJ: a Fault Injection Tool for HPC System" by Netti et al., whereas all source code can be found on the GitHub repository of the FINJ tool.

The archive contains 4 directories, one for each block of the dataset - namely CPU/Memory and HDD, in single-core and multi-core variants. In each of these directories, you will find the following: a 7z archive containing the LDMS CSV files for each of the 7 used plugins; FINJ workloads and execution logs; the histograms for the durations and inter-arrival times of fault tasks in PDF format; launch scripts, if any. Source code for all of the injected fault programs and additional details can be found on the GitHub repository of the FINJ tool.

{"references": ["A. Netti, Z. Kiziltan, O. Babaoglu, A. Sirbu, A. Bartolini, and A. Borghesi, \"Online Fault Classification in HPC Systems through Machine Learning\" in Proc. of IPDPS 2019 (submitted)", "A. Netti, Z. Kiziltan, O. Babaoglu, A. Sirbu, A. Bartolini, and A. Borghesi, \"FINJ: A fault injection tool for HPC systems,\" in Proc. of Resilience Workshop 2018. Springer, 2018. Available: https://github.com/AlessioNetti/faultinjector"]}

Keywords

Machine Learning, Monitoring, Exascale systems, Fault Detection, High-performance computing

  • BIP!
    Impact byBIP!
    citations
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    1
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
    OpenAIRE UsageCounts
    Usage byUsageCounts
    visibility views 41
    download downloads 11
  • 41
    views
    11
    downloads
    Powered byOpenAIRE UsageCounts
Powered by OpenAIRE graph
Found an issue? Give us feedback
visibility
download
citations
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
views
OpenAIRE UsageCountsViews provided by UsageCounts
downloads
OpenAIRE UsageCountsDownloads provided by UsageCounts
1
Average
Average
Average
41
11