Reinforcement with Fading Memories

descriptionPublicationkeyboard_double_arrow_right Article , Preprint , Other literature type 01 Jan 2016Embargo end date: 01 Jan 2019 English Publisher:Elsevier BVJournal:SSRN Electronic Journal (eissn: 1556-5068,

Copyright policy )

Authors: Kuang Xu; Se-Young Yun;

doi: 10.2139/ssrn.2842284 , 10.1287/moor.2019.1031 , 10.1145/3219617.3219653 , 10.1145/3292040.3219653 , 10.48550/arxiv.1907.12227

arXiv: 1907.12227

Reinforcement with Fading Memories

- Summary
- Subjects
- Metrics

Abstract

We study the effect of imperfect memory on decision making in the context of a stochastic sequential action-reward problem. An agent chooses a sequence of actions, which generate discrete rewards at different rates. She is allowed to make new choices at rate β, whereas past rewards disappear from her memory at rate μ. We focus on a family of decision rules where the agent makes a new choice by randomly selecting an action with a probability approximately proportional to the amount of past rewards associated with each action in her memory. We provide closed form formulas for the agent’s steady-state choice distribution in the regime where the memory span is large ([Formula: see text]) and show that the agent’s success critically depends on how quickly she updates her choices relative to the speed of memory decay. If [Formula: see text], the agent almost always chooses the best action (that is, the one with the highest reward rate). Conversely, if [Formula: see text], the agent chooses an action with a probability roughly proportional to its reward rate.

Related Organizations

Korea Advanced Institute of Science and Technology
Korea (Republic of)
Stanford University
STANFORD UNIVERSITY
Stanford University
Stanford University

View all View all

Keywords

reinforcement, FOS: Computer and information sciences, Computer Science - Machine Learning, Management decision making, including multiple objectives, 91E40 (Primary), 60J27 (Secondary), Probability (math.PR), Machine Learning (cs.LG), memory, \(m/M/\infty\) queue, FOS: Mathematics, Markov process, fluid model, stochastic model, Mathematics - Probability

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	3
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average