Dynamic parameters in Sequential Decision Making

descriptionPublicationkeyboard_double_arrow_right Article 01 Feb 2023Embargo end date: 01 Jan 2023 Switzerland, Switzerland English Publisher:Elsevier BVJournal:Automatica, volume 148, page 110,795 (issn: 0005-1098,

Copyright policy )Funded by:SNSF | NCCR Automation (phase I)

Authors: Srivastava, Amber; Salapaka, Srinivasa M.;

doi: 10.1016/j.automatica.2022.110795 , 10.3929/ethz-b-000583918

handle: 20.500.11850/583918

Dynamic parameters in Sequential Decision Making

- Summary
- Subjects
- Metrics

Abstract

Sequential Decision Making (SDM) problems optimize over the sequence of actions (or, decisions) taken to minimize the underlying cumulative cost. These sequence of actions are referred to as the policy of the SDM. Often these problems comprise of additional (fixed and manipulable) parameters; and the objective is to determine the optimal policy as well as the manipulable parameters that minimizes the SDM cost. In this paper we address the class of SDM problems that are characterized by dynamic parameters; where the dynamics is pre-specified for a subset of parameters and manipulable for others. The objective is to determine the manipulable parameter dynamics as well as the time-varying policy such that the associated SDM cost gets minimized at each time instant. To this end, we develop a control-theoretic framework to design the manipulable parameter dynamics such that it tracks the optimal values of the parameters, and simultaneously determines the time-varying optimal policy. Our methodology builds upon a Maximum Entropy Principle (MEP) based framework that addresses SDMs. More precisely, the above framework results into a smooth approximation of the SDM cost which we utilize as a control Lyapunov function. We show that under the resulting control law the parameters asymptotically track the local optimal, the proposed control law is Lipschitz continuous and bounded, and the policy of the SDM is optimal for a given set of parameter values. The simulations demonstrate the efficacy of our proposed methodology.

Automatica, 148

ISSN:0005-1098

Countries

Switzerland, Switzerland

Related Organizations

ETH Zurich
Switzerland

Keywords

Markov and semi-Markov decision processes, maximum entropy principle, Sequential Decision Making, Sequential Decision Making; Markov decision processes; Network design, Network design, Decision theory, parameterized state and action spaces, Markov decision processes

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

0

Average

Green

hybrid

Fields of Science (3) View all

engineering and technology

electrical engineering, electronic engineering, information engineering

Fields of Science

engineering and technology

electrical engineering, electronic engineering, information engineering

View all

Funded by

SNSF| NCCR Automation (phase I)