descriptionPublicationkeyboard_double_arrow_right Article , Preprint , Conference object 28 Jun 2022Embargo end date: 01 Jan 2021Publisher:Association for the Advancement of Artificial Intelligence (AAAI)Journal:Proceedings of the AAAI Conference on Artificial Intelligence, volume 36, pages 7,525-7,533 (issn: 2159-5399, eissn: 2374-3468,

Authors: Liotet, Pierre; Vidaich, Francesco; Metelli, Alberto Maria; Restelli, Marcello;

doi: 10.1609/aaai.v36i7.20717 , 10.48550/arxiv.2112.06625

arXiv: http://arxiv.org/abs/2112.06625

handle: 11311/1220030

Lifelong Hyper-Policy Optimization with Multiple Importance Sampling Regularization

- Summary
- Subjects
- Related research
  (1)
- Metrics

Abstract

Learning in a lifelong setting, where the dynamics continually evolve, is a hard challenge for current reinforcement learning algorithms. Yet this would be a much needed feature for practical applications. In this paper, we propose an approach which learns a hyper-policy, whose input is time, that outputs the parameters of the policy to be queried at that time. This hyper-policy is trained to maximize the estimated future performance, efficiently reusing past data by means of importance sampling, at the cost of introducing a controlled bias. We combine the future performance estimate with the past performance to mitigate catastrophic forgetting. To avoid overfitting the collected data, we derive a differentiable variance bound that we embed as a penalization term. Finally, we empirically validate our approach, in comparison with state-of-the-art algorithms, on realistic environments, including water resource management and trading.

Related Organizations

Polytechnic University of Milan
Italy
University of Padua
Italy

Keywords

FOS: Computer and information sciences, Computer Science - Machine Learning, Machine Learning (cs.LG)

1 Research products, page 1 of 1

polis software on GitHub
IsRelatedTo

Impact byBIP!

	citations This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	2
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average