Name: Custom Algorithm-based Fault Tolerance for Attention Layers in Transformers
Keywords: Machine Learning, Hardware Architecture, FOS: Computer and information sciences, Hardware Architecture (cs.AR), Machine Learning (cs.LG)

descriptionPublicationkeyboard_double_arrow_right Article , Preprint 29 Sep 2025Embargo end date: 01 Jan 2025Publisher:IEEEJournal:2025 IEEE 38th International System-on-Chip Conference (SOCC)

Authors: Titopoulos, Vasileios; Alexandridis, Kosmas; Dimitrakopoulos, Giorgos;

doi: 10.1109/socc66126.2025.11235399 , 10.48550/arxiv.2507.16676

arXiv: 2507.16676

Custom Algorithm-based Fault Tolerance for Attention Layers in Transformers

- Summary
- Subjects
- Metrics

Abstract

Transformers and large language models (LLMs), powered by the attention mechanism, have transformed numerous AI applications, driving the need for specialized hardware accelerators. A major challenge in these accelerators is efficiently detecting errors caused by random hardware faults. Traditional algorithm-based fault tolerance (ABFT) techniques verify individual matrix multiplications but fall short in handling the full attention mechanism, particularly due to intermediate softmax normalization. This work proposes Flash-ABFT, a novel method that computes an online checksum across the entire three-matrix product of query, key and value matrices, of an attention layer, including the softmax operation, with a single check. This approach significantly reduces overhead by eliminating redundant checks while maintaining high fault-detection accuracy. Experimental results demonstrate that Flash-ABFT incurs only 5.3% hardware area overhead and less than 1.9% energy overhead, making it a cost-effective and robust solution for error detection in attention accelerators.

IEEE International System-on-Chip Conference (IEEE SOCC 2025)

Keywords

Machine Learning, Hardware Architecture, FOS: Computer and information sciences, Hardware Architecture (cs.AR), Machine Learning (cs.LG)

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

Average

Green