
AI pipelines built around LLMs are often treated as deterministic systems, but in practice they behave as probabilistic distributed systems. This paper presents a distributed-systems-inspired framework for managing non-determinism in production AI inference pipelines. We introduce Probabilistic Compute Graphs (PCGs), identify key sources of variability, and propose five architectural principles—versioning, tracing, replay, quorum validation, and guardrails—instantiated in a two-plane architecture separating inference from reliability infrastructure. The framework provides a practical approach to improving reproducibility, observability, and consistency in systems such as RAG and multi-agent pipelines. This is a position and systems-design paper focused on runtime reliability of inference pipelines rather than training-time reproducibility. Version 2: Formatting improvements and layout refinements. No changes to technical content.
