Entropy regularized reinforcement learning using large deviation theory

descriptionPublicationkeyboard_double_arrow_right Article , Preprint 10 May 2023Embargo end date: 01 Jan 2021 English Publisher:American Physical Society (APS)Journal:Physical Review Research, volume 5 (eissn: 2643-1564,

Copyright policy )Funded by:NSF | Large Deviations and Driv...

Authors: Argenis Arriojas; Jacob Adamczyk; Stas Tiomkin; Rahul V. Kulkarni;

doi: 10.1103/physrevresearch.5.023085 , 10.48550/arxiv.2106.03931

arXiv: 2106.03931

Entropy regularized reinforcement learning using large deviation theory

- Summary
- Subjects
- Metrics

Abstract

Reinforcement learning (RL) is an important field of research in machine learning that is increasingly being applied to complex optimization problems in physics. In parallel, concepts from physics have contributed to important advances in RL with developments such as entropy-regularized RL. While these developments have led to advances in both fields, obtaining analytical solutions for optimization in entropy-regularized RL is currently an open problem. In this paper, we establish a mapping between entropy-regularized RL and research in non-equilibrium statistical mechanics focusing on Markovian processes conditioned on rare events. In the long-time limit, we apply approaches from large deviation theory to derive exact analytical results for the optimal policy and optimal dynamics in Markov Decision Process (MDP) models of reinforcement learning. The results obtained lead to a novel analytical and computational framework for entropy-regularized RL which is validated by simulations. The mapping established in this work connects current research in reinforcement learning and non-equilibrium statistical mechanics, thereby opening new avenues for the application of analytical and computational approaches from one field to cutting-edge problems in the other.

Related Organizations

University of California, Berkeley
United States
San Jose State University
United States
University of Massachusetts System
United States
University of Massachusetts Boston
United States

Keywords

FOS: Computer and information sciences, Computer Science - Machine Learning, Statistical Mechanics (cond-mat.stat-mech), Computer Science - Artificial Intelligence, Physics, QC1-999, FOS: Physical sciences, Machine Learning (stat.ML), Machine Learning (cs.LG), Artificial Intelligence (cs.AI), Statistics - Machine Learning, Computer Engineering, Condensed Matter - Statistical Mechanics

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	1
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

1

Average

Green

gold

Fields of Science

Fields of Science

Funded by

NSF| Large Deviations and Driven Processes for Stochastic Models of Gene Expression and Its Regulation