
This paper derives a bifurcation-aware policy layer for the Learning System Stability Model (LSSM). The central construction is the Margin-Regulated bonus multiplier (MRER): beta(t) = beta_max · sigma(M(t)/M_max), where M = I_cap − L is the LSSM stability margin. Three formal results are established: (T1) MRER-UCB achieves O(ln T) cumulative regret with constant factor 1/beta²_min; (T2) MRER preserves the LSSM stability constraint L ≤ E·S_sys² under an explicit safe-action set condition; (T3) given LSSM bistability, MRER-UCB inherits hysteresis — producing lower exploration bonus on the collapse branch than on the recovery branch at the same nominal load. Version 1.1.0 incorporates corrections following peer review by ChatGPT (OpenAI) and DeepSeek AI. All results are empirically confirmed across 50 Monte Carlo runs in the companion software (DOI: 10.5281/zenodo.19005510).
bifurcation-aware policy, reinforcement learning, hysteresis, stability constraint, regret bound, margin-regulated exploration, adaptive learning, safe bandits, LSSM, UCB
bifurcation-aware policy, reinforcement learning, hysteresis, stability constraint, regret bound, margin-regulated exploration, adaptive learning, safe bandits, LSSM, UCB
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
