Exploring transfer learning for predicting I/O time across systems

Name: Exploring transfer learning for predicting I/O time across systems
Creator: Voß, Adrian
Keywords: machine learning, I/O, HPC, deep learning, transfer learning, HPC , I/O , transfer learning , machine learning , deep learning , explainable AI, info:eu-repo/classification/ddc/004, explainable AI

Untersuchung von Transfer Learning zur Vorhersage von I/O Zeiten verschiedener Systeme

descriptionPublicationkeyboard_double_arrow_right Other literature type , Master thesis 01 Jan 2024Embargo end date: 27 Sep 2024 English Publisher:RWTH Aachen University

Authors: Voß, Adrian;

doi: 10.18154/rwth-2024-09250

Exploring transfer learning for predicting I/O time across systems

- Summary
- Subjects
- Metrics

Abstract

The transition from petascale to exascale systems requires HPC research to investigate I/O performance more than ever before to assist application developers as well as system owners to achieve the best possible performance. In addition to the behaviours of different applications the congestion effects, global I/O weather and system noise come into play. Recent results such as [6] and [5] by Isakov et al. demonstrate that Machine Learning based modeling is a promising tool to cope with this complexity. However, the approach by Isakov et al. requires large amounts of training data which is the reason why Dmytro Povaliaev proposes a transfer learning approach in his master thesis [15]. Motivated by the mentioned previous work, I show a novel deep dive analysis workflow which allows to analyse the predictions quality of a model on the level of individual applications or even lower. Additionally, the integration of explainable AI algorithms enables the practitioner of this workflow to gain detailed insights into the I/O patterns the model has learned. Using this workflow I demonstrate that my model is able to predict the time widely used HPC applications spent on I/O with an accuracy that is considered to be usable in practice by domain experts and system owners. Finally, my workflow allows to isolate insufficient predictions and improve them by further fine tuning the model.

Masterarbeit, RWTH Aachen University, 2024; Aachen : RWTH Aachen University 1 Online-Ressource: Illustrationen (2024). = Masterarbeit, RWTH Aachen University, 2024

Published by RWTH Aachen University, Aachen

Related Organizations

RWTH Aachen University
Germany

Keywords

machine learning, I/O, HPC, deep learning, transfer learning, HPC , I/O , transfer learning , machine learning , deep learning , explainable AI, info:eu-repo/classification/ddc/004, explainable AI

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

Average

Green