Model-Based Algorithms

In Chapter 2, we talked about the parts of the setup that form the agent and the part that forms the environment. The agent gets the state St = s and learns a policy π(s| a) that maps states to actions. The agent uses this policy to take an action At = a when in state St = s. The system transitions to the next time instant of t + 1. The environment responds to the action (At = a) by putting the agent in a new state of St + 1 = s’ and providing feedback to the agent in terms of a reward, Rt + 1. The agent has no control over what the new state St + 1 and reward Rt + 1 will be. This transition from (St = s, At = a) → (Rt + 1 = r, St + 1 = s’) is governed by the environment. This is known as transition dynamics. For a given pair of (s, a), there could be one or more pairs of (r, s’). In a deterministic world, we would have a single pair of (r, s’) for a fixed combination of (s, a). However, in stochastic environments, i.e., environments with uncertain outcomes, we could have many pairs of (r, s’) for a given (s, a).

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

0

Average

Upload OA version

Are you the author of this publication? Upload your Open Access version to Zenodo!

It’s fast and easy, just two clicks!

uploadUpload now