The Best of Many Worlds: Scheduling Machine Learning Inference on CPU-GPU Integrated Architectures

descriptionPublicationkeyboard_double_arrow_right Article , Conference object 01 May 2022Publisher:IEEEJournal:2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)Funded by:EC | C4IIoT, EC | CONCORDIA, EC | MARVEL +1 projects

Authors: Giorgos Vasiliadis; Rafail Tsirbas; Sotirios Ioannidis;

doi: 10.1109/ipdpsw55747.2022.00017 , 10.5281/zenodo.6410911 , 10.5281/zenodo.6410912

The Best of Many Worlds: Scheduling Machine Learning Inference on CPU-GPU Integrated Architectures

- Summary
- Subjects
- Metrics

Abstract

A plethora of applications are using machine learning, the operations of which are becoming more complex and require additional computing power. At the same time, typical commodity system setups (including desktops, servers, and embedded devices) are now offering different processing devices, the most often of which are multi-core CPUs, integrated GPUs, and discrete GPUs. In this paper, we follow a data-driven approach, where we first show the performance of different processing devices when executing a diversified set of inference engines; some processing devices perform better for different performance metrics (e.g., throughput, latency, and power consumption), while at the same time, these metrics may also deviate significantly among different applications. Based on these findings, we propose an adaptive scheduling approach, tailored for machine learning inference operations, that enables the use of the most efficient processing device available. Our scheduler is device-agnostic and can respond quickly to dynamic fluctuations that occur at real-time, such as data bursts, application overloads and system changes. The experimental results show that it is able to match the peak throughput, by predicting correctly the optimal processing device with an accuracy of 92.5%, with energy savings up to 10%.

Related Organizations

Technical University of Crete
Greece
Foundation for Research and Technology Hellas
Greece
Hellenic Mediterranean University
Greece

Keywords

heterogeneous accelerators, inference, machine learning, gpgpu, GPU, scheduling

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	6
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Top 10%
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Top 10%