Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ International Journa...arrow_drop_down
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Article . 2026
License: CC BY
Data sources: Datacite
ZENODO
Article . 2026
License: CC BY
Data sources: Datacite
versions View all 3 versions
addClaim

AI Runtime Infrastructure: Establishing a Foundational Layer for Distributed AI Systems

Authors: Ashutosh Shanker;

AI Runtime Infrastructure: Establishing a Foundational Layer for Distributed AI Systems

Abstract

Architecturally, the AI Runtime Infrastructure, or AIRI, is a foundational layer of distributed architecture designed to enable the execution of large-scale AI workloads. Most modern distributed architectures, heavily influenced by cloud-native design principles, are designed for stateless, deterministic, synchronous, and microservices-based workloads. As such, they are not designed to manage efficiently the stateful, probabilistic, and adaptive workloads that AI execution entails. AIRI is proposed as a runtime layer and reference architecture providing application-agnostic support across compute, storage, and networking infrastructure. It supports core runtime responsibilities such as model lifecycle management, orchestration of heterogeneous accelerators, cross-model coordination, and inference-time policy enforcement. In addition, the architecture includes control-plane capabilities such as model-aware routing, which aid efficiency and governance, as well as data-plane capabilities including feature servers, embedding infrastructure, and vector search. Engineering challenges include multi-model coherence, runtime safety, model-aware scheduling, dynamic batching, and fairness scheduling in multi-tenant environments. As with virtualization and container orchestration in previous generations of computing, AIRI establishes AI workloads as first-class distributed system workloads that require a dedicated runtime and layered abstractions for optimal performance. It eases the scalable, reliable, and efficient deployment of generative models, multimodal systems, and agentic architectures in diverse cloud-native environments. This paper presents a layered architectural model for AIRI, identifies key engineering challenges, and discusses implications for future distributed systems infrastructure.

  • BIP!
    Impact byBIP!
    selected citations
    These citations are derived from selected sources.
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    0
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
Powered by OpenAIRE graph
Found an issue? Give us feedback
selected citations
These citations are derived from selected sources.
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
0
Average
Average
Average