Reinforcement Learning in Non-Markovian Environments

descriptionPublicationkeyboard_double_arrow_right Article , Preprint 01 Jan 2022Embargo end date: 01 Jan 2022 English Publisher:Elsevier BVJournal:SSRN Electronic Journal (eissn: 1556-5068,

Copyright policy )

Authors: Siddharth Chandak; Pratik Shah; Vivek S. Borkar; Parth Dodhia;

doi: 10.2139/ssrn.4293001 , 10.1016/j.sysconle.2024.105751 , 10.48550/arxiv.2211.01595

arXiv: 2211.01595

Reinforcement Learning in Non-Markovian Environments

- Summary
- Subjects
- Metrics

Abstract

Motivated by the novel paradigm developed by Van Roy and coauthors for reinforcement learning in arbitrary non-Markovian environments, we propose a related formulation and explicitly pin down the error caused by non-Markovianity of observations when the Q-learning algorithm is applied on this formulation. Based on this observation, we propose that the criterion for agent design should be to seek good approximations for certain conditional laws. Inspired by classical stochastic control, we show that our problem reduces to that of recursive computation of approximate sufficient statistics. This leads to an autoencoder-based scheme for agent design which is then numerically tested on partially observed reinforcement learning environments.

19 pages, accepted for publication at Systems and Control Letters

Related Organizations

Indian Institute of Technology Bombay
India
Department of Electrical Engineering and Computer Science University of Michigan
United States
Stanford University
United States

Keywords

FOS: Computer and information sciences, recursively computed sufficient statistics, Computer Science - Machine Learning, agent design, Multi-agent systems, Learning and adaptive systems in artificial intelligence, partially observed MDP, Stochastic learning and adaptive control, Systems and Control (eess.SY), Non-Markovian processes: estimation, Electrical Engineering and Systems Science - Systems and Control, Machine Learning (cs.LG), Q-learning, FOS: Electrical engineering, electronic engineering, information engineering, curse of non-Markovianity

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	9
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Top 10%
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Top 10%
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Top 10%

Found an issue? Give us feedback

9

Top 10%

Green

Fields of Science (3) View all

engineering and technology

electrical engineering, electronic engineering, information engineering

Fields of Science

engineering and technology

electrical engineering, electronic engineering, information engineering

View all