Exploring alternatives to policy search

Name: Exploring alternatives to policy search
Creator: Güemes Palau, Carlos
Keywords: Network Routing, Deep Reinforcement Learning, Aprenentatge per Reforç Profund, :Informàtica::Enginyeria del software [Àrees temàtiques de la UPC], Computació evolutiva, Evolutionary computation, Neural networks (Computer science), Reinforcement learning, High Perfomance Computing, Xarxes neuronals (Informàtica)

Güemes Palau, Carlos

Found an issue? Give us feedback

downloadFull-Text

UPCommons. Portal de...arrow_drop_down

UPCommons. Portal del coneixement obert de la UPC

Master thesis . 2022

Full-Text: https://upcommons.upc.edu/bitstreams/24d7a88f-91df-46e3-a4f1-4f5c183e2b5d/download

Data sources: UPCommons. Portal del coneixement obert de la UPC

Recolector de Ciencia Abierta, RECOLECTA

Master thesis . 2022

Data sources: Recolector de Ciencia Abierta, RECOLECTA

Recolector de Ciencia Abierta, RECOLECTA

Master thesis . 2022

Data sources: Recolector de Ciencia Abierta, RECOLECTA

Exploring alternatives to policy search

descriptionPublicationkeyboard_double_arrow_right Master thesis 01 Jan 2022 Spain English Publisher:Universitat Politècnica de Catalunya

Authors: Güemes Palau, Carlos;

handle: 2117/364236

Exploring alternatives to policy search

- Summary
- Subjects
- Metrics

Abstract

The field of Reinforcement Learning (RL) has been receiving much attention during the last few years as a new paradigm to solve complex problems. However, one of the main issues with the current state of the art is their computational cost. Compared with other paradigms such as Supervised learning, RL requires constant interaction with the environment, which is both expensive and hard to parallelize. In this work we explore a more scalable alternative to conventional RL through the use of Evolution Strategies (ES). This consists in iteratively modifying the current solution by adding Gaussian noise to it, evaluating these modifications, and use their score to guide the improvement of the solution. The advantage of ES lies on that creating and evaluating these modifications can be parallelized. After introducing the network routing scenario, we used it to compare how ES performed against PPO, a RL policy gradient method. Ultimately ES took advantage of increasing its number of workers to eventually overtake PPO, training faster while also generating better results overall. However, it was also clear that for this to occur ES must have access to a considerable amount of hardware resources, hence being viable only within high perfomance computing environments.

Country

Spain

Related Organizations

Keywords

Network Routing, Deep Reinforcement Learning, Aprenentatge per Reforç Profund, :Informàtica::Enginyeria del software [Àrees temàtiques de la UPC], Computació evolutiva, Evolutionary computation, Neural networks (Computer science), Reinforcement learning, High Perfomance Computing, Xarxes neuronals (Informàtica), Graph Neural Networks, Computació Evolutiva, Computació d'Alt Rendiment, Àrees temàtiques de la UPC::Informàtica::Enginyeria del software, Message Passing Neural Networks, Aprenentatge per Reforç, Enrutament de xarxes, Xarxes neuronals basades en intercanvi de missatges, Reinforcement Learning, Xarxes Neuronals Gràfiques, Aprenentatge per reforç, Evolutionary Strategies, High performance computing, Evolutionary Computation, Càlcul intensiu (Informàtica), Estratègies Evolutives

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average