
Reinforcement learning is an area of machine learning inspired by behaviorist psychology, concerned with how software agents ought to take actions in an environment so as to maximize some notion of cumulative reward. Recently reinforcement learning has been given abroad attention, but when it is applied to solve problems with large-scale discrete or contiguous state space environments, the results are likely to be unsatisfactory and even fail to find optimal policies. In order to solve this problem, we establish a new generative model about the value function and use Gaussian Process Regression to approximate the state-action pairs which were never or seldom visited. We testify to the performance of the proposed algorithm by an access-control queuing job in a cloud computing environment. The computational results demonstrate the scheme can balance the exploration and exploitation in the learning process and accelerate the convergence to a certain extent.
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 2 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
