GPI-tree search: algorithms for decision-time planning with the general policy improvement theorem

Name: GPI-tree search: algorithms for decision-time planning with the general policy improvement theorem
Keywords: Computer. Automation, Engineering sciences. Technology

Louis Bagot; Lynn D’eer; Steven Latré; Tom De Schepper; Kevin Mets

Found an issue? Give us feedback

Neural Computing and...arrow_drop_down

Neural Computing and Applications

Article . 2025 . Peer-reviewed

License: CC BY NC ND

Data sources: Crossref

Institutional Repository Universiteit Antwerpen

Conference object . 2023

Data sources: Institutional Repository Universiteit Antwerpen

GPI-tree search: algorithms for decision-time planning with the general policy improvement theorem

descriptionPublicationkeyboard_double_arrow_right Article , Conference object 11 Jul 2025 English Publisher:Springer Science and Business Media LLCJournal:Neural Computing and Applications, volume 37, pages 18,989-19,007 (issn: 0941-0643, eissn: 1433-3058,

Copyright policy )Funded by:EC | euROBIN

Authors: Louis Bagot; Lynn D’eer; Steven Latré; Tom De Schepper; Kevin Mets;

doi: 10.1007/s00521-025-11304-4

handle: 10067/2015970151162165141

GPI-tree search: algorithms for decision-time planning with the general policy improvement theorem

- Summary
- Subjects
- Metrics

Abstract

Abstract: In Reinforcement Learning, Unsupervised Skill Discovery tackles the learning of several policies for downstream task transfer. Once these skills are learnt, the question of how best to use and combine them remains an open problem. The General Policy Improvement Theorem (GPI) creates a policy stronger than any individual skill by selecting the highest-valued policy at each timestep. However, the GPI policy is unable to mix and combine the skills at decision time to formulate stronger plans. In this paper, we propose to adopt a model-based setting in order to make such planning possible, and formally show that a forward search improves on the GPI policy and any shallower searches under some approximation term. We argue for decision-time planning, and design a family of algorithms, GPI-Tree Search Algorithms, to use Monte Carlo Tree Search (MCTS) with GPI. These algorithms foster the skills and𝑄-value priors of the GPI framework to guide and improve the search. Our quantitative experiments show that the resulting policies are much stronger than the GPI policy alone, while our qualitative results provide a good intuitive understanding of how each method works and of the possible design choices that can be made.

Related Organizations

University of Antwerp
Belgium

Keywords

Computer. Automation, Engineering sciences. Technology

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

0

Average

Green

hybrid

Funded by

EC| euROBIN