DTM-NUCA: Dynamic Texture Mapping-NUCA for Energy-Efficient Graphics Rendering

descriptionPublicationkeyboard_double_arrow_right Article , Conference object 01 Mar 2022 Spain Publisher:IEEEJournal:2022 30th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)Funded by:EC | CoCoUnit

Authors: Corbalán Navarro, David; Aragón, Juan Luis; Parcerisa Bundó, Joan Manuel; González Colás, Antonio María;

doi: 10.1109/pdp55904.2022.00030

handle: 2117/365471

DTM-NUCA: Dynamic Texture Mapping-NUCA for Energy-Efficient Graphics Rendering

- Summary
- Subjects
- Related research
  (8)
- Metrics

Abstract

Modern mobile GPUs integrate an increasing number of shader cores to speedup the execution of graphics workloads. Each core integrates a private Texture Cache to apply texturing effects on objects, which is backed-up by a shared L2 cache. However, as in any other memory hierarchy, such organization produces data replication in the upper levels (i.e., the private Texture Caches) to allow for faster accesses at the expense of reducing their overall effective capacity. E.g., in a mobile GPU with four shader cores, about 84.6% of the requested texture blocks are replicated in at least one of the other private Texture Caches. This paper proposes a novel dynamically-mapped NonUniform Cache Architecture (NUCA) organization for the private Texture Caches of a mobile GPU aimed at increasing their effective overall capacity and decreasing the overall access latency by attacking data replication. A block missing in a local Texture Cache may be serviced by a remote one at a cost smaller than a round trip to the shared L2. The proposed Dynamic Texture Mapping-NUCA (DTM-NUCA) features a lightweight mapping table, called Affinity Table, that is independent of the L2 cache size, unlike a traditional NUCA organization. The best owner for a given set of blocks is dynamically determined and stored in the Affinity Table to maximize local accesses. The mechanism also allows for a certain amount of replication to favor local accesses where appropriate, without hurting performance due to the small capacity loss resulting from the allowed replication. DTM-NUCA is presented in two flavors. One with a centralized Affinity Table, and another with a distributed Affinity Table. Experimental results show first that the L2 pressure is effectively reduced, eliminating 41.8% of the L2 accesses on average. As for the average latency, DTM-NUCA performs a very effective job at maximizing local over remote accesses, achieving 73.8% of local accesses on average. As a consequence, our novel DTM-NUCA organization obtains an average speedup of 16.9% and overall 7.6% energy savings over a conventional organization. This work has been supported by the CoCoUnit ERC Advanced Grant of the EU’s Horizon 2020 program (grant No 833057), the Spanish State Research Agency (MCIN/AEI) under grant PID2020-113172RB-I00, the ICREA Academia program and a research fellowship from the University of Murcia’s “Plan Propio de Investigacion”. Peer Reviewed

Country

Spain

Related Organizations

Universitat Polite`cnica de Catalunya
Spain
University of Murcia
Spain
Universitat Politècnica de Catalunya
Spain

Keywords

GPUs, Gestió de memòria (Informàtica), Unitats de processament gràfic, Cache, Energy efficiency, Àrees temàtiques de la UPC::Informàtica::Arquitectura de computadors, Memory management (Computer science), Mobile devices, Rendering (Computer graphics), NUCA, Graphics processing units, Graphics pipeline, :Informàtica::Arquitectura de computadors [Àrees temàtiques de la UPC]

8 Research products, page 1 of 1

Irregular accesses reorder unit: improving GPGPU memory coalescing for graph-based workloads
2022IsAmongTopNSimilarDocuments
DNN pruning with principal component analysis and connection importance estimation
2022IsAmongTopNSimilarDocuments
MEGsim: A Novel Methodology for Efficient Simulation of Graphics Workloads in GPUs
2022IsAmongTopNSimilarDocuments
Dynamic sampling rate: harnessing frame coherence in graphics applications for energy-efficient GPUs
2022IsAmongTopNSimilarDocuments
Characterizing self-driving tasks in general-purpose architectures
2021IsAmongTopNSimilarDocuments
A Survey of Near-Data Processing Architectures for Neural Networks
2022IsAmongTopNSimilarDocuments
TCOR: A Tile Cache with Optimal Replacement
2022IsAmongTopNSimilarDocuments
Sliding window support for image processing in autonomous vehicles
2022IsAmongTopNSimilarDocuments

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average