<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=undefined&type=result"></script>');
-->
</script>

COPY SCRIPT

For further information contact us at helpdesk@openaire.eu

EFFICIENT AND PRODUCTIVE GPU PROGRAMMING

Name: EFFICIENT AND PRODUCTIVE GPU PROGRAMMING
Creator: Zhang, Mengchi
Keywords: Distributed systems and algorithms, Digital processor architectures, Programming languages, Operating systems

descriptionPublicationkeyboard_double_arrow_right Thesis 01 Jan 2022Publisher:Purdue University Graduate School

Authors: Zhang, Mengchi;

doi: 10.25394/pgs.20393985.v1 , 10.25394/pgs.20393985

EFFICIENT AND PRODUCTIVE GPU PROGRAMMING

- Summary
- Subjects
- Metrics

Abstract

Productive programmable accelerators, like GPUs, have been developed for generations to support programming features. The ever-increasing performance improves the usability of programming features on GPUs, and these programming features further ease the porting of code and data structure from CPU to GPU. However, GPU programming features, such as function call or runtime polymorphism, have not been well explored or optimized. I identify efficient and productive GPU programming as a potential area to exploit. Although many programming paradigms are well studied and efficiently supported on CPU architectures, their performance on novel accelerators, like GPUs, has never been studied, evaluated, and made perfect. For instance, programming with functions is a commonplace programming paradigm that shapes software programs with modularity and simplifies code with reusability. A large amount of work has been proposed to alleviate function calling overhead on CPUs, however, less paper talked about its deficiencies on GPUs. On the other hand, polymorphism amplifies an object’s behaviors at runtime. A body of work targets efficient polymorphism on CPUs, but no work has ever discussed this feature under GPU contexts. In this dissertation, I discussed those two programming features on GPU architectures. First, I performed the first study to identify the deficiency of GPU polymorphism. I created micro-benchmarks to evaluate virtual function overhead in controlled settings and the first GPU polymorphic benchmark suite, ParaPoly, to investigate real-world scenarios. The micro-benchmarks indicated that the virtual function overhead is usually negligible but can cause up to a 7x slowdown. Virtual functions in ParaPoly show a geometric meaning of 77% overhead on GPUs compared to the function’s inlined version. Second, I proposed two novel techniques that determine an object’s type only by its address pointer to improve GPU polymorphism. The first technique, Coordinated Object Allocation and function Lookup (COAL) is a software-only technique that uses the object’s address to determine its type. The second technique, TypePointer, needs hardware modification to embed the object’s type information into its address pointer. COAL achieves 80% and 6% improvements, and TypePointer achieves 90% and 12% over contemporary CUDA and our type-based SharedOA. Considering the growth of GPU programs, function calls become a pervasive paradigm to be consistently used on GPUs. I also identified the overhead of excessive register spilling with function calls on GPU. To diminish this cost, I proposed a novel Massively Multithreaded Register Windowing technique with Variable Size Register Window and Register-Conscious Warp Scheduling. Our techniques improve the representative workloads with a geometric mean of 1.18x with only 1.8% hardware storage overhead.

Related Organizations

Purdue University System
United States

Keywords

Distributed systems and algorithms, Digital processor architectures, Programming languages, Operating systems

Impact byBIP!

	citations This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

Average

Upload OA version

Are you the author? Do you have the OA version of this publication?

uploadUpload now!