
doi: 10.1145/3408060
This article presents a novel energy-efficient cache design for massively parallel, throughput-oriented architectures like GPUs. Unlike L1 data cache on modern GPUs, L2 cache shared by all of the streaming multiprocessors is not the primary performance bottleneck, but it does consume a large amount of chip energy. We observe that L2 cache is significantly underutilized by spending 95.6% of the time storing useless data. If such “dead time” on L2 is identified and reduced, L2’s energy efficiency can be drastically improved. Fortunately, we discover that the SIMT programming model of GPUs provides a unique feature among threads: instruction-level data locality similarity, which can be used to accurately predict the data re-reference counts at L2 cache block level. We propose a simple design that leverages this Lo cality S imilarity to build an energy-efficient GPU L2 Cache , named LoSCache . Specifically, LoSCache uses the data locality information from a small group of cooperative thread arrays to dynamically predict the L2-level data re-reference counts of the remaining cooperative thread arrays. After that, specific L2 cache lines can be powered off if they are predicted to be “dead” after certain accesses. Experimental results on a wide range of applications demonstrate that our proposed design can significantly reduce the L2 cache energy by an average of 64% with only 0.5% performance loss. In addition, LoSCache is cost effective, independent of the scheduling policies, and compatible with the state-of-the-art L1 cache designs for additional energy savings.
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 3 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
