
In Chapter 2, we talked about the parts of the setup that form the agent and the part that forms the environment. The agent gets the state St = s and learns a policy π(s| a) that maps states to actions. The agent uses this policy to take an action At = a when in state St = s. The system transitions to the next time instant of t + 1. The environment responds to the action (At = a) by putting the agent in a new state of St + 1 = s’ and providing feedback to the agent in terms of a reward, Rt + 1. The agent has no control over what the new state St + 1 and reward Rt + 1 will be. This transition from (St = s, At = a) → (Rt + 1 = r, St + 1 = s’) is governed by the environment. This is known as transition dynamics. For a given pair of (s, a), there could be one or more pairs of (r, s’). In a deterministic world, we would have a single pair of (r, s’) for a fixed combination of (s, a). However, in stochastic environments, i.e., environments with uncertain outcomes, we could have many pairs of (r, s’) for a given (s, a).
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
