Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ ZENODOarrow_drop_down
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Article . 2026
License: CC BY
Data sources: ZENODO
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Article . 2026
License: CC BY
Data sources: ZENODO
versions View all 2 versions
addClaim

GUARDRAIL-CENTRIC FINE-TUNING FOR DETERMINISTIC DECISION SYSTEMS

Authors: Davenport;

GUARDRAIL-CENTRIC FINE-TUNING FOR DETERMINISTIC DECISION SYSTEMS

Abstract

This paper introduces Guardrail-Centric Fine-Tuning, a novel paradigm for safely deploying large language models (LLMs) in deterministic, constraint-heavy operational decision systems, using inventory replenishment in a distribution environment as a practical testbed. Rather than fine-tuning models on item-specific outcomes—which often leads to brittle generalization, loss of reasoning capability, and silent failures—the approach aligns a quantized Qwen2.5-Coder-14B model to approximately fifty generalized, domain-agnostic behavioral guardrails that enforce strict reasoning boundaries, constraint hierarchies, and audit requirements. Paired with a deterministic Python enforcement layer handling all numerical calculations and hard rules, this hybrid architecture separates probabilistic reasoning from exact execution, yielding stable, explainable, and auditable ordering recommendations across diverse product catalogs. Empirical results demonstrate enhanced robustness, preservation of general capabilities, and elimination of common fine-tuning pitfalls (such as trigger-target confusion or degraded states), underscoring that constraining how models reason—rather than dictating what outcomes they produce—is a more reliable strategy for enterprise-grade AI deployment in high-stakes domains like supply chain management.

Powered by OpenAIRE graph
Found an issue? Give us feedback