Near-Optimal Sample Complexity in Reward-Free Kernel-Based Reinforcement Learning

Name: Near-Optimal Sample Complexity in Reward-Free Kernel-Based Reinforcement Learning
Keywords: Machine Learning, FOS: Computer and information sciences, Machine Learning (cs.LG)

descriptionPublicationkeyboard_double_arrow_right Article , Preprint 01 Jan 2025Embargo end date: 01 Jan 2025Publisher:arXiv

Authors: Kayal, Aya; Vakili, Sattar; Toni, Laura; Bernacchia, Alberto;

doi: 10.48550/arxiv.2502.07715

arXiv: 2502.07715

Near-Optimal Sample Complexity in Reward-Free Kernel-Based Reinforcement Learning

- Summary
- Subjects
- Metrics

Abstract

Reinforcement Learning (RL) problems are being considered under increasingly more complex structures. While tabular and linear models have been thoroughly explored, the analytical study of RL under nonlinear function approximation, especially kernel-based models, has recently gained traction for their strong representational capacity and theoretical tractability. In this context, we examine the question of statistical efficiency in kernel-based RL within the reward-free RL framework, specifically asking: how many samples are required to design a near-optimal policy? Existing work addresses this question under restrictive assumptions about the class of kernel functions. We first explore this question by assuming a generative model, then relax this assumption at the cost of increasing the sample complexity by a factor of H, the length of the episode. We tackle this fundamental problem using a broad class of kernels and a simpler algorithm compared to prior work. Our approach derives new confidence intervals for kernel ridge regression, specific to our RL setting, which may be of broader applicability. We further validate our theoretical findings through simulations.

Accepted at AISTATS 2025

Keywords

Machine Learning, FOS: Computer and information sciences, Machine Learning (cs.LG)

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

Average

Green