
doi: 10.1109/asap65064.2025.00012 , 10.5281/zenodo.17609103 , 10.48550/arxiv.2508.01180 , 10.5281/zenodo.17609102
arXiv: 2508.01180
handle: 11585/1040001
doi: 10.1109/asap65064.2025.00012 , 10.5281/zenodo.17609103 , 10.48550/arxiv.2508.01180 , 10.5281/zenodo.17609102
arXiv: 2508.01180
handle: 11585/1040001
Attention-based models demand flexible hardware to manage diverse kernels with varying arithmetic intensities and memory access patterns. Large clusters with shared L1 memory, a commonarchitectural pattern, struggle to fully utilize their processing elements (PEs) when scaled up due to reduced throughput in the hierarchical PE-to-L1 intra-cluster interconnect. This paper presents Dynamic Allocation Scheme (DAS), a runtime programmable address remapping hardware unit coupled with a unified memory allocator, designed to minimize data access contention of PEs onto the multi-banked L1. We evaluated DAS on an aggressively scaled-up 1024-PE RISC-V cluster with Non-Uniform Memory Access (NUMA) PE-to-L1 interconnect to demonstrate its potential for improving data locality in large parallel machine learning workloads. For a Vision Transformer (ViT)-L/16 model, each encoder layer executes in 5.67ms, achieving a 1.94× speedup over the fixed word-level interleaved baseline with 0.81 PE utilization. Implemented in 12nm FinFET technology, DAS incurs <0.1% area overhead.
Hardware Architecture, FOS: Computer and information sciences, Hardware Architecture (cs.AR), Manycore; RISC-V; Transformers
Hardware Architecture, FOS: Computer and information sciences, Hardware Architecture (cs.AR), Manycore; RISC-V; Transformers
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
