"Prudential Gating Function v3: Multi-Seed Validation of a Risk-Aware Reward Shaping Mechanism for Reinforcement Learning"

Rivera Garcia, Jose M

Found an issue? Give us feedback

ZENODOarrow_drop_down

ZENODO

Report . 2025

License: CC BY

Data sources: ZENODO

ZENODO

Report . 2025

License: CC BY

Data sources: Datacite

ZENODO

Report . 2025

License: CC BY

Data sources: Datacite

"Prudential Gating Function v3: Multi-Seed Validation of a Risk-Aware Reward Shaping Mechanism for Reinforcement Learning"

Función de Puerta Prudencial v3: Validación Multi-Semilla de un Mecanismo de Recompensa Sensible al Riesgo para Aprendizaje por Refuerz

descriptionPublicationkeyboard_double_arrow_right Report 02 Dec 2025 Spanish Publisher:Zenodo

Authors: Rivera Garcia, Jose M;

doi: 10.5281/zenodo.17793348 , 10.5281/zenodo.17793347

"Prudential Gating Function v3: Multi-Seed Validation of a Risk-Aware Reward Shaping Mechanism for Reinforcement Learning"

- Summary
- Subjects
- Metrics

Abstract

I present a multi-seed validation of the Prudential Gating Function (PGF) v3, a reward shaping mechanism designed to induce risk-aware behavior in reinforcement learning agents operating in stochastic environments. Across three independent random seeds in a 5×5 gridworld with moderate risk conditions (risk_scale=1.5), PGF v3 achieves a mean performance ratio of 38.93% ± 0.59% relative to a risk-blind control agent, with exceptional statistical reproducibility (coefficient of variation = 1.52%). This represents a cumulative +131.7% improvement over our initial baseline implementation Presento una validación multi-semilla de la Función de Puerta Prudencial (PGF) v3, un mecanismo de modelado de recompensas diseñado para inducir comportamiento sensible al riesgo en agentes de aprendizaje por refuerzo que operan en entornos estocásticos. A través de tres semillas aleatorias independientes en un mundo de rejilla (gridworld) de 5×5 con condiciones de riesgo moderado (risk_scale=1.5), PGF v3 logra una razón de desempeño promedio de 38.93% ± 0.59% en relación con un agente de control ciego al riesgo, con una reproducibilidad estadística excepcional (coeficiente de variación = 1.52%). Esto representa una mejora acumulada de +131.7% sobre nuestra implementación base inicial.

Keywords

reinforcement learning, Aprendizaje, multi-seed reproducibility, safe reinforcement learning, Risk Modeling, Modelo de Riesgo, Alineamiento, alignment tax, statistical validation, Artificial Intelligence, prudential behavior, AI Safety, risk-aware agents, Inteligencia Artificial, reward shaping

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

0

Average

Green

Related to Research communities

Knowmad Institut