Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ ZENODOarrow_drop_down
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Preprint . 2026
License: CC BY
Data sources: ZENODO
ZENODO
Preprint . 2026
License: CC BY
Data sources: Datacite
ZENODO
Preprint . 2026
License: CC BY
Data sources: Datacite
versions View all 2 versions
addClaim

Reasoning-Constraint Elasticity under Dynamic Feasible Sets

Auxiliary-Witness Falsification, Tail-Evidence Preservation, and Ruin-Bounded Barbell Control An Observable-Only, No-Meta Theory for Autonomous and General Adaptive Systems
Authors: Marc and Gemini as known as Shinkidan; Takahashi, K;

Reasoning-Constraint Elasticity under Dynamic Feasible Sets

Abstract

Alignment interventions can improve policy adherence and harm mitigation while also inducing over-refusal and benign capability regressions. In many deployments, base checkpoints or matched base logs are unavailable; therefore base-relative restriction magnitude is not identifiable from present observables. We develop a reference-free alternative: Reasoning–Constraint Elasticity (RCE) over dynamic feasible sets. Capability is represented as simultaneous satisfaction of observable inequality families with context/time-varying thresholds; feasibility is summarized by minimum slack. Primary objects are finite-difference elasticities (e.g., Δ𝑅/Δ𝑀𝑔, −Δ𝐶/Δ𝑀𝑔), robust to non-smooth regime transitions.The framework is strengthened in fourteen directions. (i) Predictable thresholds: non-anticipative, replayable threshold processes. (ii) Slack decomposition: observed slack movement is decomposed into frozen-threshold movement plus bounded threshold-drift contribution, with explicit active-constraint switching residuals. (iii) Auxiliary-witness falsification: independently specified witness inequalities provide a falsifiable disagreement channel with non-redundancy certificates. (iv) Goodhart/gaming resistance: public-vs-audit evaluator split with delayed audit selection, transfer-gap certification, and all-channel fail-closed gating. (v) Attack resistance: append-only transcript commitments with quorum-signed roots and split-view incompatibility conditions. (vi) Contamination robustness: 𝜀-contamination margins and ratio domains are corrected by adversarial-budget terms. (vii) Tail evidence preservation (TEPP): tail candidates are first committed as immutable evidence objects before value judgment. (viii) Delayed opportunity: immediate and horizon-𝐻 opportunity signals are unified through delayed re-evaluation with doubly robust correction. (ix) Certified tail chance: rare contexts are counted as measurable chance only when net upside, reserve sufficiency, and depletion severity are jointly certified. (x) Dual-layer tail-positive gate: hard fail-closed ruin guard plus bounded fail-open discovery guard. (xi) Safe niche search: context is an optimizable variable under explicit viability constraints. (xii) Cryptographic replay/reveal: replayable leaf schemas, delayed reveal transcripts, and VRF-based audit selectors. (xiii) Barbell portfolio control: dual-gate architecture is formalized as a ruin-bounded convex-opportunity portfolio with explicit exploration allocation, budget constraints, and skin-in-the-game agency symmetry. (xiv) The Convexity Principle in Safety (CPS), referred to as the Shinkidan Principle: maximize bounded convex upside only under hard ruin constraints, with preserve-before-judge evidence discipline. Results are observable-only and auditable. They define falsifiable measurement and reporting rules under declared assumptions, without reliance on inaccessible internal narratives. Although motivated by AI alignment, the formalism is system-level and applies to any adaptive decision process where only externally observable traces are admissible.

Keywords

Artificial intelligence, Observable-Only, The Convexity Principle in Safety, Contamination Robustness, AI alignment, Shinkidan, Ruin-Bounded Barbell Control, Reasoning-Constraint Elasticity, Dynamic Feasible Sets, Auxiliary-Witness Falsification, Finite-Difference Elasticity, Tail-Evidence Preservation Protocol, out of distribution, Goodhart Resistance, OOD

  • BIP!
    Impact byBIP!
    selected citations
    These citations are derived from selected sources.
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    0
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
Powered by OpenAIRE graph
Found an issue? Give us feedback
selected citations
These citations are derived from selected sources.
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
0
Average
Average
Average