Name: Taking GPU Programming Models to Task for Performance Portability
Keywords: Performance (cs.PF), FOS: Computer and information sciences, Performance, Distributed, Parallel, and Cluster Computing, Distributed, Parallel, and Cluster Computing (cs.DC)

descriptionPublicationkeyboard_double_arrow_right Article , Preprint 08 Jun 2025Embargo end date: 01 Jan 2024Publisher:ACMJournal:Proceedings of the 39th ACM International Conference on SupercomputingFunded by:UKRI | R2LIB (Reclamation, Reman..., NSF | Graduate Research Fellows...

Authors: Joshua Hoke Davis; Pranav Sivaraman; Joy Kitson; Konstantinos Parasyris; Harshitha Menon; Isaac Minn; Giorgis Georgakoudis; +1 Authors

doi: 10.1145/3721145.3730423 , 10.48550/arxiv.2402.08950

arXiv: 2402.08950

Taking GPU Programming Models to Task for Performance Portability

- Summary
- Subjects
- Metrics

Abstract

Portability is critical to ensuring high productivity in developing and maintaining scientific software as the diversity in on-node hardware architectures increases. While several programming models provide portability for diverse GPU systems, they don't make any guarantees about performance portability. In this work, we explore several programming models -- CUDA, HIP, Kokkos, RAJA, OpenMP, OpenACC, and SYCL, to assess the consistency of their performance across NVIDIA and AMD GPUs. We use five proxy applications from different scientific domains, create implementations where missing, and use them to present a comprehensive comparative evaluation of the performance portability of these programming models. We provide a Spack scripting-based methodology to ensure reproducibility of experiments conducted in this work. Finally, we analyze the reasons for why some programming models underperform in certain scenarios and in some cases, present performance optimizations to the proxy applications.

16 pages, 5 figures

Related Organizations

University of Maryland, College Park
United States
Lawrence Livermore National Laboratory
Department of Computer Science University of Maryland
United States
Lawrence Livermore National Laboratory
United States
University of Maryland University College
United States

View all View all

Keywords

Performance (cs.PF), FOS: Computer and information sciences, Performance, Distributed, Parallel, and Cluster Computing, Distributed, Parallel, and Cluster Computing (cs.DC)

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

Average

Green

Funded by

UKRI| R2LIB (Reclamation, Remanufacture of Li Ion Batteries), NSF| Graduate Research Fellowship Program (GRFP)