q-Learning in Continuous Time

descriptionPublicationkeyboard_double_arrow_right Article , Preprint 01 Jan 2022Embargo end date: 01 Jan 2022 English Publisher:Elsevier BVJournal:SSRN Electronic Journal (eissn: 1556-5068,

Copyright policy )

Authors: Yanwei Jia; Xun Yu Zhou;

doi: 10.2139/ssrn.4152195 , 10.48550/arxiv.2207.00713

arXiv: 2207.00713

q-Learning in Continuous Time

- Summary
- Subjects
- Metrics

Abstract

We study the continuous-time counterpart of Q-learning for reinforcement learning (RL) under the entropy-regularized, exploratory diffusion process formulation introduced by Wang et al. (2020). As the conventional (big) Q-function collapses in continuous time, we consider its first-order approximation and coin the term ``(little) q-function". This function is related to the instantaneous advantage rate function as well as the Hamiltonian. We develop a ``q-learning" theory around the q-function that is independent of time discretization. Given a stochastic policy, we jointly characterize the associated q-function and value function by martingale conditions of certain stochastic processes, in both on-policy and off-policy settings. We then apply the theory to devise different actor-critic algorithms for solving underlying RL problems, depending on whether or not the density function of the Gibbs measure generated from the q-function can be computed explicitly. One of our algorithms interprets the well-known Q-learning algorithm SARSA, and another recovers a policy gradient (PG) based continuous-time algorithm proposed in Jia and Zhou (2022b). Finally, we conduct simulation experiments to compare the performance of our algorithms with those of PG-based algorithms in Jia and Zhou (2022b) and time-discretized conventional Q-learning algorithms.

70 pages, 4 figures, appended with an erratum

Related Organizations

Columbia University
Columbia University
COLUMBIA UNIVERSITY
Columbia University
United States
Department of Industrial Engineering and Operations Research Columbia University
United States

View all View all

Keywords

q-function, FOS: Computer and information sciences, Computer Science - Machine Learning, Computer Science - Artificial Intelligence, martingale, Learning and adaptive systems in artificial intelligence, Computational Finance (q-fin.CP), policy improvement, on-policy and off-policy, Machine Learning (cs.LG), FOS: Economics and business, Quantitative Finance - Computational Finance, Artificial Intelligence (cs.AI), continuous-time reinforcement learning

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	4
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Top 10%
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Top 10%

Found an issue? Give us feedback

4

Top 10%

Average

Top 10%

Green

Fields of Science (4) View all

engineering and technology

electrical engineering, electronic engineering, information engineering

Fields of Science

engineering and technology

electrical engineering, electronic engineering, information engineering

View all