
arXiv: 2405.04620
In this work, we present a generalized formulation of the Transformer algorithm by reinterpreting its core mechanisms within the framework of Path Integral formalism. In this perspective, the attention mechanism is recast as a process that integrates all possible transition paths leading to future token states, with temporal evolution governed by the Feed-Forward Network. By systematically mapping each component of the Transformer to its counterpart in the Path Integral formulation, we obtain a more compact and efficient representation, in which the contextual information of a sequence is condensed into memory-like segments. These segments are recurrently processed across Transformer layers, enabling more effective long-term information retention. We validate the effectiveness of this approach through the Passkey retrieval task and a summarization task, demonstrating that the proposed method preserves historical information while exhibiting memory usage that scales linearly with sequence length. This contrasts with the non-linear memory growth typically observed in standard attention mechanisms. We expect that this quantum-inspired generalization of the Transformer architecture will open new avenues for enhancing both the efficiency and expressiveness of future Transformer models.
11 pages, 12 figures
FOS: Computer and information sciences, feed-forward network (FNN), Computer Science - Machine Learning, Computer Science - Computation and Language, Computer Science - Artificial Intelligence, quantum mechanics, Attention mechanism, Computer Science - Neural and Evolutionary Computing, FOS: Physical sciences, TK1-9971, Machine Learning (cs.LG), condensation, High Energy Physics - Phenomenology, High Energy Physics - Phenomenology (hep-ph), Artificial Intelligence (cs.AI), Electrical engineering. Electronics. Nuclear engineering, Neural and Evolutionary Computing (cs.NE), memory-efficient, Computation and Language (cs.CL), path integral
FOS: Computer and information sciences, feed-forward network (FNN), Computer Science - Machine Learning, Computer Science - Computation and Language, Computer Science - Artificial Intelligence, quantum mechanics, Attention mechanism, Computer Science - Neural and Evolutionary Computing, FOS: Physical sciences, TK1-9971, Machine Learning (cs.LG), condensation, High Energy Physics - Phenomenology, High Energy Physics - Phenomenology (hep-ph), Artificial Intelligence (cs.AI), Electrical engineering. Electronics. Nuclear engineering, Neural and Evolutionary Computing (cs.NE), memory-efficient, Computation and Language (cs.CL), path integral
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
