
Architecturally, the AI Runtime Infrastructure, or AIRI, is a foundational layer of distributed architecture designed to enable the execution of large-scale AI workloads. Most modern distributed architectures, heavily influenced by cloud-native design principles, are designed for stateless, deterministic, synchronous, and microservices-based workloads. As such, they are not designed to manage efficiently the stateful, probabilistic, and adaptive workloads that AI execution entails. AIRI is proposed as a runtime layer and reference architecture providing application-agnostic support across compute, storage, and networking infrastructure. It supports core runtime responsibilities such as model lifecycle management, orchestration of heterogeneous accelerators, cross-model coordination, and inference-time policy enforcement. In addition, the architecture includes control-plane capabilities such as model-aware routing, which aid efficiency and governance, as well as data-plane capabilities including feature servers, embedding infrastructure, and vector search. Engineering challenges include multi-model coherence, runtime safety, model-aware scheduling, dynamic batching, and fairness scheduling in multi-tenant environments. As with virtualization and container orchestration in previous generations of computing, AIRI establishes AI workloads as first-class distributed system workloads that require a dedicated runtime and layered abstractions for optimal performance. It eases the scalable, reliable, and efficient deployment of generative models, multimodal systems, and agentic architectures in diverse cloud-native environments. This paper presents a layered architectural model for AIRI, identifies key engineering challenges, and discusses implications for future distributed systems infrastructure.
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
