Multi-Provider Nfv Network Service Delegation Via Average Reward Reinforcement Learning

descriptionPublicationkeyboard_double_arrow_right Article , Preprint 01 Jan 2022Embargo end date: 01 Jan 2021 English Publisher:Elsevier BVJournal:SSRN Electronic Journal (eissn: 1556-5068,

Copyright policy )Funded by:EC | 5GROWTH

Authors: Bahador Bakhshi; Josep Mangues; Jorge Baranda;

doi: 10.2139/ssrn.4192647 , 10.2139/ssrn.4192649 , 10.1016/j.comnet.2023.109611 , 10.48550/arxiv.2112.13093 , 10.5281/zenodo.7805055

arXiv: 2112.13093

Multi-Provider Nfv Network Service Delegation Via Average Reward Reinforcement Learning

- Summary
- Subjects
- Metrics

Abstract

In multi-provider 5G/6G networks, service delegation enables administrative domains to federate in provisioning NFV network services. Admission control is fundamental in selecting the appropriate deployment domain to maximize average profit without prior knowledge of service requests' statistical distributions. This paper analyzes a general federation contract model for service delegation in various ways. First, under the assumption of known system dynamics, we obtain the theoretically optimal performance bound by formulating the admission control problem as an infinite-horizon Markov decision process (MDP) and solving it through dynamic programming. Second, we apply reinforcement learning to practically tackle the problem when the arrival and departure rates are not known. As Q-learning maximizes the discounted rewards, we prove it is not an efficient solution due to its sensitivity to the discount factor. Then, we propose the average reward reinforcement learning approach (R-Learning) to find the policy that directly maximizes the average profit. Finally, we evaluate different solutions through extensive simulations and experimentally using the 5Growth platform. Results confirm that the proposed R-Learning solution always outperforms Q-Learning and the greedy policies. Furthermore, while there is at most 9% optimality gap in the ideal simulation environment, it competes with the MDP solution in the experimental assessment.

Related Organizations

Amirkabir University of Technology
Iran (Islamic Republic of)
Centre Tecnologic De Telecomunicacions De Catalunya
Spain

Keywords

Networking and Internet Architecture (cs.NI), FOS: Computer and information sciences, Admission-control, Average reward, Markov processes, Simulation platform, Learning algorithms, Multi-provider service delegation, Dynamic programming, Average reward reinforcement learning, Reinforcement learnings, Computer Science - Networking and Internet Architecture, Benchmarking, 5G mobile communication systems, Markov Decision Processes, Networks services, Reinforcement learning, Service deployment, Profitability, Prior-knowledge, R-learning

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	2
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average