ReaLM: Reliable and Efficient Large Language Model Inference with Statistical Algorithm-Based Fault Tolerance

Name: ReaLM: Reliable and Efficient Large Language Model Inference with Statistical Algorithm-Based Fault Tolerance
Keywords: FOS: Computer and information sciences, Hardware Architecture (cs.AR), Computer Science - Hardware Architecture

Xie, Tong; Zhao, Jiawang; Wan, Zishen; Zhang, Zuodong; Wang, Yuan; Wang, Runsheng; Huang, Ru; Li, Meng

Found an issue? Give us feedback

arXiv.org e-Print Ar...arrow_drop_down

arXiv.org e-Print Archive

Preprint . 2025

Data sources: arXiv.org e-Print Archive

https://doi.org/10.1109/dac638...

Article . 2025 . Peer-reviewed

License: STM Policy #29

Data sources: Crossref

https://dx.doi.org/10.48550/ar...

Article . 2025

License: arXiv Non-Exclusive Distribution

Data sources: Datacite

ReaLM: Reliable and Efficient Large Language Model Inference with Statistical Algorithm-Based Fault Tolerance

descriptionPublicationkeyboard_double_arrow_right Article , Preprint 22 Jun 2025Embargo end date: 01 Jan 2025Publisher:IEEEJournal:2025 62nd ACM/IEEE Design Automation Conference (DAC)

Authors: Xie, Tong; Zhao, Jiawang; Wan, Zishen; Zhang, Zuodong; Wang, Yuan; Wang, Runsheng; Huang, Ru; +1 Authors

doi: 10.1109/dac63849.2025.11132610 , 10.48550/arxiv.2503.24053

arXiv: 2503.24053

ReaLM: Reliable and Efficient Large Language Model Inference with Statistical Algorithm-Based Fault Tolerance

- Summary
- Subjects
- Metrics

Abstract

The demand for efficient large language model (LLM) inference has propelled the development of dedicated accelerators. As accelerators are vulnerable to hardware faults due to aging, variation, etc, existing accelerator designs often reserve a large voltage margin or leverage algorithm-based fault tolerance (ABFT) techniques to ensure LLM inference correctness. However, previous methods often overlook the inherent fault tolerance of LLMs, leading to high computation and energy overhead. To enable reliable yet efficient LLM inference, in this paper, we propose a novel algorithm/circuit co-design framework, dubbed ReaLM. For the first time, we systematically characterize the fault tolerance of LLMs by performing a large-scale error injection study of representative LLMs and natural language understanding tasks. Then, we propose a statistical ABFT algorithm that fully leverages the error robustness to minimize error recovery as much as possible. We also customize the error detection circuits to enable a low-cost online collection of error statistics. Extensive experiments show that with only 1.42% circuit area and 1.79% power overhead, our ReaLM can reduce perplexity degradation from 18.54 to 0.29. Compared to existing methods, ReaLM consistently reduces recovery costs across different operating voltages and improves energy efficiency by up to 35.83% without compromising LLM performance. Our error injection code is available at https://github.com/PKU-SEC-Lab/ReaLM_DAC25/

6 pages, 10 figures. Accepted by Design Automation Conference (DAC) 2025

Related Organizations

Peking University
China (People's Republic of)
Georgia Institute of Technology
United States

Keywords

FOS: Computer and information sciences, Hardware Architecture (cs.AR), Computer Science - Hardware Architecture

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

0

Average

Green