Name: Folded Context Condensation in Path Integral Formalism for Infinite Context Transformers
Keywords: FOS: Computer and information sciences, feed-forward network (FNN), Computer Science - Machine Learning, Computer Science - Computation and Language, Computer Science - Artificial Intelligence, quantum mechanics, Attention mechanism, Computer Science - Neural and Evolutionary Computing, FOS: Physical sciences, TK1-9971

descriptionPublicationkeyboard_double_arrow_right Article , Preprint 01 Jan 2025Embargo end date: 01 Jan 2024Publisher:Institute of Electrical and Electronics Engineers (IEEE)Journal:IEEE Access, volume 13, pages 77,582-77,592 (eissn: 2169-3536,

Authors: Won-Gi Paeng; Daesuk Kwon; Kyungwon Jeong; Honggyo Suh;

doi: 10.1109/access.2025.3565773 , 10.48550/arxiv.2405.04620

arXiv: 2405.04620

Folded Context Condensation in Path Integral Formalism for Infinite Context Transformers

- Summary
- Subjects
- Metrics

Abstract

In this work, we present a generalized formulation of the Transformer algorithm by reinterpreting its core mechanisms within the framework of Path Integral formalism. In this perspective, the attention mechanism is recast as a process that integrates all possible transition paths leading to future token states, with temporal evolution governed by the Feed-Forward Network. By systematically mapping each component of the Transformer to its counterpart in the Path Integral formulation, we obtain a more compact and efficient representation, in which the contextual information of a sequence is condensed into memory-like segments. These segments are recurrently processed across Transformer layers, enabling more effective long-term information retention. We validate the effectiveness of this approach through the Passkey retrieval task and a summarization task, demonstrating that the proposed method preserves historical information while exhibiting memory usage that scales linearly with sequence length. This contrasts with the non-linear memory growth typically observed in standard attention mechanisms. We expect that this quantum-inspired generalization of the Transformer architecture will open new avenues for enhancing both the efficiency and expressiveness of future Transformer models.

11 pages, 12 figures

Keywords

FOS: Computer and information sciences, feed-forward network (FNN), Computer Science - Machine Learning, Computer Science - Computation and Language, Computer Science - Artificial Intelligence, quantum mechanics, Attention mechanism, Computer Science - Neural and Evolutionary Computing, FOS: Physical sciences, TK1-9971, Machine Learning (cs.LG), condensation, High Energy Physics - Phenomenology, High Energy Physics - Phenomenology (hep-ph), Artificial Intelligence (cs.AI), Electrical engineering. Electronics. Nuclear engineering, Neural and Evolutionary Computing (cs.NE), memory-efficient, Computation and Language (cs.CL), path integral

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

Average

Green

gold