Name: NVR: Vector Runahead on NPUs for Sparse Memory Access
Keywords: FOS: Computer and information sciences, Artificial Intelligence (cs.AI), Computer Science - Artificial Intelligence, Hardware Architecture (cs.AR), Computer Science - Hardware Architecture

descriptionPublicationkeyboard_double_arrow_right Article , Preprint 22 Jun 2025Embargo end date: 01 Jan 2025Publisher:IEEEJournal:2025 62nd ACM/IEEE Design Automation Conference (DAC)

Authors: Wang, Hui; Zhao, Zhengpeng; Wang, Jing; Du, Yushu; Cheng, Yuan; Guo, Bing; Xiao, He; +7 Authors

doi: 10.1109/dac63849.2025.11132724 , 10.48550/arxiv.2502.13873

arXiv: 2502.13873

NVR: Vector Runahead on NPUs for Sparse Memory Access

- Summary
- Subjects
- Metrics

Abstract

Deep Neural Networks are increasingly leveraging sparsity to reduce the scaling up of model parameter size. However, reducing wall-clock time through sparsity and pruning remains challenging due to irregular memory access patterns, leading to frequent cache misses. In this paper, we present NPU Vector Runahead (NVR), a prefetching mechanism tailored for NPUs to address cache miss problems in sparse DNN workloads. Rather than optimising memory patterns with high overhead and poor portability, NVR adapts runahead execution to the unique architecture of NPUs. NVR provides a general micro-architectural solution for sparse DNN workloads without requiring compiler or algorithmic support, operating as a decoupled, speculative, lightweight hardware sub-thread alongside the NPU, with minimal hardware overhead (under 5%). NVR achieves an average 90% reduction in cache misses compared to SOTA prefetching in general-purpose processors, delivering 4x average speedup on sparse workloads versus NPUs without prefetching. Moreover, we investigate the advantages of incorporating a small cache (16KB) into the NPU combined with NVR. Our evaluation shows that expanding this modest cache delivers 5x higher performance benefits than increasing the L2 cache size by the same amount.

Related Organizations

Dalian Polytechnic University
China (People's Republic of)
Huazhong University of Science and Technology
China (People's Republic of)
Harbin Institute of Technology
China (People's Republic of)
Southeast University
China (People's Republic of)

Keywords

FOS: Computer and information sciences, Artificial Intelligence (cs.AI), Computer Science - Artificial Intelligence, Hardware Architecture (cs.AR), Computer Science - Hardware Architecture

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

Average

Green

Related to Research communities

UArctic