
We propose Paradox-Aware Reinforcement Learning (PA-RL), a framework for modeling and resolving the intrinsic paradoxes that arise in closed-loop time-series learning. By formalizing four fundamental paradoxes—exploration–exploitation, credit assignment, sim-to-real transfer, and distribution shift—we develop a modular approach that makes trade-offs explicit and measurable. We show how PA-RL integrates into sequential decision-making under performative environments, outline evaluation protocols with paradox-sensitive metrics, and highlight applied domains including finance, robotics, and healthcare. This preprint contributes a principled foundation for studying reinforcement learning in self-referential, feedback-altering settings.
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
