
handle: 10230/49225
The options framework incorporates temporally extended actions (termed options) to the reinforcement learning paradigm. A wide variety of prior works exist that experimentally illustrate the significance of options on the performance of a learning algorithm in a complex domains. However, the work by Fruit et al. on the semi-Markov Decision Process (SMDP) version of the UCRL2 algorithm introduced a formal understanding of circumstance that make options conducive to the performance of a learning algorithm. In this work we present our implementation of the algorithm proposed by Fruit et al. We perform experimentation on a navigation task characterized by a grid world domain. We achieve a sub-linear trend in accumulated regret as well as a linear trend in accumulated reward in the grid world domain using empirical Bernstein peeling as confidence bound.
Treball fi de màster de: Master in Intelligent Interactive Systems
Tutors: Anders Jonsson i M. Sadegh Talebi
Hierarchical reasoning, Reinforcement learning, Machine learning, Options framework
Hierarchical reasoning, Reinforcement learning, Machine learning, Options framework
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
