
doi: 10.1109/hpcc.2014.71
With the rapid growth in demand of massive data processing and the limitation of process development in microprocessor, GPGPU gains more and more attentions to provide huge power of data parallelism. Tightly-coupled CPU and GPGPU that share the LLC (last level cache) enables fine-grained workload offload between CPU and GPGPU. In the paper, we focus on one data transfer pattern where the data are usually in form of independent element, each of which is to be processed by the other processor when it is ready. Traditionally, CPU prepares all the data that are to be processed by GPGPU before starting GPGPU. This creates long waiting time, and the shared LLC may suffer from cache trashing if the work set can not fit in the LLC. To alleviate these problems, we propose the LLC buffer as a data transfer mechanism between CPU and GPGPU on shared LLC. The LLC buffer exploits part of LLC storage to work as one or more stream buffers, and stashes each data element as an independent transfer unit. With the help of LLC buffer, we achieve an average speedup of 1.48x and eliminate 1346x memory writes (cache evictions) from LLC.
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 2 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
