Learning in Herding Mean Field Games: Single-Loop Algorithm with Finite-Time Convergence Analysis

Name: Learning in Herding Mean Field Games: Single-Loop Algorithm with Finite-Time Convergence Analysis
Keywords: Optimization and Control (math.OC), FOS: Mathematics, Mathematics - Optimization and Control

Zeng, Sihan; Bhatt, Sujay; Koppel, Alec; Ganesh, Sumitra

Found an issue? Give us feedback

arXiv.org e-Print Ar...arrow_drop_down

arXiv.org e-Print Archive

Preprint . 2024

Data sources: arXiv.org e-Print Archive

https://dx.doi.org/10.48550/ar...

Article . 2024

License: arXiv Non-Exclusive Distribution

Data sources: Datacite

Learning in Herding Mean Field Games: Single-Loop Algorithm with Finite-Time Convergence Analysis

descriptionPublicationkeyboard_double_arrow_right Article , Preprint 01 Jan 2024Embargo end date: 01 Jan 2024Publisher:arXiv

Authors: Zeng, Sihan; Bhatt, Sujay; Koppel, Alec; Ganesh, Sumitra;

doi: 10.48550/arxiv.2408.04780

arXiv: 2408.04780

Learning in Herding Mean Field Games: Single-Loop Algorithm with Finite-Time Convergence Analysis

- Summary
- Subjects
- Metrics

Abstract

We consider discrete-time stationary mean field games (MFG) with unknown dynamics and design algorithms for finding the equilibrium with finite-time complexity guarantees. Prior solutions to the problem assume either the contraction of a mean field optimality-consistency operator or strict weak monotonicity, which may be overly restrictive. In this work, we introduce a new class of solvable MFGs, named the "fully herding class", which expands the known solvable class of MFGs and for the first time includes problems with multiple equilibria. We propose a direct policy optimization method, Accelerated Single-loop Actor Critic Algorithm for Mean Field Games (ASAC-MFG), that provably finds a global equilibrium for MFGs within this class, under suitable access to a single trajectory of Markovian samples. Different from the prior methods, ASAC-MFG is single-loop and single-sample-path. We establish the finite-time and finite-sample convergence of ASAC-MFG to a mean field equilibrium via new techniques that we develop for multi-time-scale stochastic approximation. We support the theoretical results with illustrative numerical simulations. When the mean field does not affect the transition and reward, a MFG reduces to a Markov decision process (MDP) and ASAC-MFG becomes an actor-critic algorithm for finding the optimal policy in average-reward MDPs, with a sample complexity matching the state-of-the-art. Previous works derive the complexity assuming a contraction on the Bellman operator, which is invalid for average-reward MDPs. We match the rate while removing the untenable assumption through an improved Lyapunov function.

Keywords

Optimization and Control (math.OC), FOS: Mathematics, Mathematics - Optimization and Control

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

0

Average

Green