Powered by OpenAIRE graph
Found an issue? Give us feedback
addClaim

This Research product is the result of merged Research products in OpenAIRE.

You have already added 0 works in your ORCID record related to the merged Research product.

EFFICIENT AND PRODUCTIVE GPU PROGRAMMING

Authors: Zhang, Mengchi;

EFFICIENT AND PRODUCTIVE GPU PROGRAMMING

Abstract

Productive programmable accelerators, like GPUs, have been developed for generations to support programming features. The ever-increasing performance improves the usability of programming features on GPUs, and these programming features further ease the porting of code and data structure from CPU to GPU. However, GPU programming features, such as function call or runtime polymorphism, have not been well explored or optimized. I identify efficient and productive GPU programming as a potential area to exploit. Although many programming paradigms are well studied and efficiently supported on CPU architectures, their performance on novel accelerators, like GPUs, has never been studied, evaluated, and made perfect. For instance, programming with functions is a commonplace programming paradigm that shapes software programs with modularity and simplifies code with reusability. A large amount of work has been proposed to alleviate function calling overhead on CPUs, however, less paper talked about its deficiencies on GPUs. On the other hand, polymorphism amplifies an object’s behaviors at runtime. A body of work targets efficient polymorphism on CPUs, but no work has ever discussed this feature under GPU contexts. In this dissertation, I discussed those two programming features on GPU architectures. First, I performed the first study to identify the deficiency of GPU polymorphism. I created micro-benchmarks to evaluate virtual function overhead in controlled settings and the first GPU polymorphic benchmark suite, ParaPoly, to investigate real-world scenarios. The micro-benchmarks indicated that the virtual function overhead is usually negligible but can cause up to a 7x slowdown. Virtual functions in ParaPoly show a geometric meaning of 77% overhead on GPUs compared to the function’s inlined version. Second, I proposed two novel techniques that determine an object’s type only by its address pointer to improve GPU polymorphism. The first technique, Coordinated Object Allocation and function Lookup (COAL) is a software-only technique that uses the object’s address to determine its type. The second technique, TypePointer, needs hardware modification to embed the object’s type information into its address pointer. COAL achieves 80% and 6% improvements, and TypePointer achieves 90% and 12% over contemporary CUDA and our type-based SharedOA. Considering the growth of GPU programs, function calls become a pervasive paradigm to be consistently used on GPUs. I also identified the overhead of excessive register spilling with function calls on GPU. To diminish this cost, I proposed a novel Massively Multithreaded Register Windowing technique with Variable Size Register Window and Register-Conscious Warp Scheduling. Our techniques improve the representative workloads with a geometric mean of 1.18x with only 1.8% hardware storage overhead.

Related Organizations
Keywords

Distributed systems and algorithms, Digital processor architectures, Programming languages, Operating systems

  • BIP!
    Impact byBIP!
    citations
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    0
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
Powered by OpenAIRE graph
Found an issue? Give us feedback
citations
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
0
Average
Average
Average
Upload OA version
Are you the author? Do you have the OA version of this publication?