CaMDN: Enhancing Cache Efficiency for Multi-tenant DNNs on Integrated NPUs

Name: CaMDN: Enhancing Cache Efficiency for Multi-tenant DNNs on Integrated NPUs
Keywords: FOS: Computer and information sciences, Computer Science - Operating Systems, Artificial Intelligence (cs.AI), Computer Science - Artificial Intelligence, Operating Systems (cs.OS), Hardware Architecture (cs.AR), Computer Science - Hardware Architecture

Tianhao Cai; Liang Wang 0020; Limin Xiao; Meng Han; Zeyu Wang; Lin Sun; Xiaojian Liao

Found an issue? Give us feedback

arXiv.org e-Print Ar...arrow_drop_down

arXiv.org e-Print Archive

Preprint . 2025

Data sources: arXiv.org e-Print Archive

https://doi.org/10.1109/dac638...

Article . 2025 . Peer-reviewed

License: STM Policy #29

Data sources: Crossref

https://dx.doi.org/10.48550/ar...

Article . 2025

License: arXiv Non-Exclusive Distribution

Data sources: Datacite

DBLP

Article

Data sources: DBLP

DBLP

Conference object

Data sources: DBLP

CaMDN: Enhancing Cache Efficiency for Multi-tenant DNNs on Integrated NPUs

descriptionPublicationkeyboard_double_arrow_right Article , Preprint , Conference object 22 Jun 2025Embargo end date: 01 Jan 2025Publisher:IEEEJournal:2025 62nd ACM/IEEE Design Automation Conference (DAC)

Authors: Tianhao Cai; Liang Wang 0020; Limin Xiao; Meng Han; Zeyu Wang; Lin Sun; Xiaojian Liao;

doi: 10.1109/dac63849.2025.11132424 , 10.48550/arxiv.2505.06625

arXiv: 2505.06625

CaMDN: Enhancing Cache Efficiency for Multi-tenant DNNs on Integrated NPUs

- Summary
- Subjects
- Metrics

Abstract

With the rapid development of DNN applications, multi-tenant execution, where multiple DNNs are co-located on a single SoC, is becoming a prevailing trend. Although many methods are proposed in prior works to improve multi-tenant performance, the impact of shared cache is not well studied. This paper proposes CaMDN, an architecture-scheduling co-design to enhance cache efficiency for multi-tenant DNNs on integrated NPUs. Specifically, a lightweight architecture is proposed to support model-exclusive, NPU-controlled regions inside shared cache to eliminate unexpected cache contention. Moreover, a cache scheduling method is proposed to improve shared cache utilization. In particular, it includes a cache-aware mapping method for adaptability to the varying available cache capacity and a dynamic allocation algorithm to adjust the usage among co-located DNNs at runtime. Compared to prior works, CaMDN reduces the memory access by 33.4% on average and achieves a model speedup of up to 2.56$\times$ (1.88$\times$ on average).

7 pages, 9 figures. This paper has been accepted to the 2025 Design Automation Conference (DAC)

Related Organizations

Beihang University
China (People's Republic of)
Tsinghua University
China (People's Republic of)

Keywords

FOS: Computer and information sciences, Computer Science - Operating Systems, Artificial Intelligence (cs.AI), Computer Science - Artificial Intelligence, Operating Systems (cs.OS), Hardware Architecture (cs.AR), Computer Science - Hardware Architecture

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

0

Average

Green