
doi: 10.1145/3547301
As more emerging applications are moving to GPUs, fine-grained synchronization has become imperative. However, their performance can be severely impaired in case of frequent synchronization failures caused by high data contention. Differently from CPUs, GPUs own thousands of hardware threads and adopt single instruction multiple threads paradigm, making it impractical to deploy the CPU contention management mechanisms directly on GPUs. In this article, we design a Software Warp Controlling Framework (SWCF), which employs producer-consumer execution model and leverages GPU hardware barriers to dynamically control the execution of warps at runtime. On the basis of SWCF, we propose a contention management strategy to decrease frequent synchronization failures while avoiding the over-reducing of parallelism. We evaluate SWCF and the proposed strategy on commodity GPUs using a set of applications with fine-grained synchronization. The results show that on V100 GPU our contention management achieves a 4.7X speedup and outperforms the conventional GPU software backoff solution by 42% on average.
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 2 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
