
State representation for intelligent agents is a continuous challenge as the need for abstraction is unavoidable in large state spaces. Predictive representations offer one way to obtain state abstraction by replacing a state with a set of predictions about future interactions with the world. One such formalism is the Temporal-Difference Networks framework [2]. It splits the representation of knowledge in the question network and the answer network. The question network defines which questions (interactions) about future experience are of interest. It contains nodes, each corresponding to a single scalar prediction about a future observation given a certain sequence of interactions with the environment. The nodes are connected by links, annotated with action-labels, which represent temporal relationships between the predictions made by the nodes, conditioned on the action-labels on the links (more details in [2]). The answer network provides the predictive models to update the answers to the defined questions, which are expected values of the scalar quantities in the nodes. These values can be seen as estimates of probabilities. With each executed action of the agent, the predictions are updated using the answer network models to obtain a description of the new state. In classical TD-networks, logistic regression models are used, whose weight vector is obtained using a gradient learning approach. We propose the use of probability-valued decision trees [1] in the answer network of TD-Nets. We believe that decision trees are a particular good choice to investigate, as they offer a different yet powerful form of generalization. Moreover, this aids in a better understanding of the strengths and weaknesses of TD-Nets and represents an important first step towards using them in worlds with more extensive observations. Furthermore, decision tree induction can be regarded as a prototypical example of a non-gradient learning approach.
Technology, Science & Technology, temporal-difference networks, Computer Science, Computer Science, Artificial Intelligence, probability trees
Technology, Science & Technology, temporal-difference networks, Computer Science, Computer Science, Artificial Intelligence, probability trees
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
