
Sequential Decision Making (SDM) problems optimize over the sequence of actions (or, decisions) taken to minimize the underlying cumulative cost. These sequence of actions are referred to as the policy of the SDM. Often these problems comprise of additional (fixed and manipulable) parameters; and the objective is to determine the optimal policy as well as the manipulable parameters that minimizes the SDM cost. In this paper we address the class of SDM problems that are characterized by dynamic parameters; where the dynamics is pre-specified for a subset of parameters and manipulable for others. The objective is to determine the manipulable parameter dynamics as well as the time-varying policy such that the associated SDM cost gets minimized at each time instant. To this end, we develop a control-theoretic framework to design the manipulable parameter dynamics such that it tracks the optimal values of the parameters, and simultaneously determines the time-varying optimal policy. Our methodology builds upon a Maximum Entropy Principle (MEP) based framework that addresses SDMs. More precisely, the above framework results into a smooth approximation of the SDM cost which we utilize as a control Lyapunov function. We show that under the resulting control law the parameters asymptotically track the local optimal, the proposed control law is Lipschitz continuous and bounded, and the policy of the SDM is optimal for a given set of parameter values. The simulations demonstrate the efficacy of our proposed methodology.
Automatica, 148
ISSN:0005-1098
Markov and semi-Markov decision processes, maximum entropy principle, Sequential Decision Making, Sequential Decision Making; Markov decision processes; Network design, Network design, Decision theory, parameterized state and action spaces, Markov decision processes
Markov and semi-Markov decision processes, maximum entropy principle, Sequential Decision Making, Sequential Decision Making; Markov decision processes; Network design, Network design, Decision theory, parameterized state and action spaces, Markov decision processes
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
