Memory-Aware Latency Prediction Model for Concurrent Kernels in Partitionable GPUs: Simulations and Experiments

descriptionPublicationkeyboard_double_arrow_right Part of book or chapter of book , Article , Conference object 01 Jan 2023 Italy English Publisher:Springer Nature SwitzerlandFunded by:EC | IMOCO4.E

Authors: Masola A.; Capodieci N.; Cavicchioli R.; Olmedo I. S.; Rouxel B.;

doi: 10.1007/978-3-031-43943-8_3 , 10.5281/zenodo.8297873 , 10.5281/zenodo.8297872

handle: 11380/1323046

Memory-Aware Latency Prediction Model for Concurrent Kernels in Partitionable GPUs: Simulations and Experiments

- Summary
- Subjects
- Metrics

Abstract

The current trend in recently released Graphic Processing Units (GPUs) is to exploit transistor scaling at the architectural level, hence, larger and larger GPUs in every new chip generation are released. Architecturally, this implies that the clusters count of parallel processing elements embedded within a single GPU die is constantly increasing, posing novel and interesting research challenges for performance engineering in latency-sensitive scenarios. A single GPU kernel is now likely not to scale linearly when dispatched in a GPU that features a larger cluster count. This is either due to VRAM bandwidth acting as a bottleneck or due to the inability of the kernel to saturate the massively parallel compute power available in these novel architectures. In this context, novel scheduling approaches might be derived if we consider the GPU as a partitionable compute engine in which multiple concurrent kernels can be scheduled in non- overlapping sets of clusters. While such an approach is very effective in improving the GPU overall utilization, it poses significant challenges in estimating kernel execution time latencies when kernels are dispatched to variable-sized GPU partitions. Moreover, memory interference within co-running kernels is a mandatory aspect to consider. In this work, we derive a practical yet fairly accurate memory-aware latency estimation model for co-running GPU kernels.

Country

Italy

Related Organizations

University of Modena and Reggio Emilia
Italy

Keywords

Latency Prediction, Concurrent Kernels, GPGPU-Simulator, Partitionable GPU, Concurrent Kernels; GPGPU-Simulator; Latency Prediction; Partitionable GPU;

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	1
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average