Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ INRIA2arrow_drop_down
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
INRIA2
Doctoral thesis . 2020
Data sources: INRIA2
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
versions View all 2 versions
addClaim

This Research product is the result of merged Research products in OpenAIRE.

You have already added 0 works in your ORCID record related to the merged Research product.

Data-Efficient Robot Learning using Priors from Simulators

Authors: Kaushik, Rituraj;

Data-Efficient Robot Learning using Priors from Simulators

Abstract

Quand les robots doivent affronter le monde réel, ils doivent s'adapter à diverses situations imprévues en acquérant de nouvelles compétences le plus rapidement possible. Les algorithmes d'apprentissage par renforcement (par exemple, l'apprentissage par renforcement profond) pourraient permettre d’apprendre de telles compétences, mais les algorithmes actuels nécessitent un temps d'interaction trop important. Dans cette thèse, nous avons exploré des méthodes permettant à un robot d'acquérir de nouvelles compétences par essai-erreur en quelques minutes d'interaction physique. Notre objectif principal est de combiner des connaissances acquises sur un simulateur avec les expériences réelles du robot afin d'obtenir un apprentissage et une adaptation rapides. Dans notre première contribution, nous proposons un nouvel algorithme de recherche de politiques basé sur un modèle, appelé Multi-DEX, qui (1) est capable de trouver des politiques dans des scénarios aux récompenses rares, (2) n'impose aucune contrainte sur le type de politique ou le type de fonction de récompense et (3) est aussi efficace en termes de données que l'algorithme de recherche de politiques de l’état de l’art dans des scénarios de récompenses non rares. Dans notre deuxième contribution, nous proposons un algorithme d'apprentissage en ligne basé sur un répertoire, appelé APROL, qui permet à un robot de s'adapter rapidement à des dommages physiques (par exemple, une patte endommagée) ou à des perturbations environnementales (par exemple, les conditions du terrain) et de résoudre la tâche donnée. Nous montrons qu'APROL surpasse plusieurs lignes de base, y compris l'algorithme d'apprentissage par répertoire RTE (Reset Free Trial and Error), en résolvant les tâches en un temps d'interaction beaucoup plus court que les algorithmes avec lesquels nous l’avons comparé. Dans notre troisième contribution, nous présentons un algorithme de méta-apprentissage basé sur les gradients appelé FAMLE. FAMLE permet d'entraîner le modèle dynamique du robot à partir de données simulées afin que le modèle puisse être adapté rapidement à diverses situations invisibles grâce aux observations du monde réel. En utilisant FAMLE pour améliorer un modèle pour la commande prédictive, nous montrons que notre approche surpasse plusieurs algorithmes d'apprentissage basés ou non sur un modèle, et résout les tâches données en moins de temps d'interaction que les algorithmes avec lesquels nous l’avons comparé.

As soon as the robots step out in the real and uncertain world, they have to adapt to various unanticipated situations by acquiring new skills as quickly as possible. Unfortunately, on robots, current state-of-the-art reinforcement learning (e.g., deep-reinforcement learning) algorithms require large interaction time to train a new skill. In this thesis, we have explored methods to allow a robot to acquire new skills through trial-and-error within a few minutes of physical interaction. Our primary focus is to incorporate prior knowledge from a simulator with real-world experiences of a robot to achieve rapid learning and adaptation. In our first contribution, we propose a novel model-based policy search algorithm called Multi-DEX that (1) is capable of finding policies in sparse reward scenarios (2) does not impose any constraints on the type of policy or the type of reward function and (3) is as data-efficient as state-of-the-art model-based policy search algorithm in non-sparse reward scenarios. In our second contribution, we propose a repertoire-based online learning algorithm called APROL which allows a robot to adapt to physical damages (e.g., a damaged leg) or environmental perturbations (e.g., terrain conditions) quickly and solve the given task. In this work, we use several repertoires of policies generated in simulation for a subset of possible situations that the robot might face in real-world. During the online learning, the robot automatically figures out the most suitable repertoire to adapt and control the robot. We show that APROL outperforms several baselines including the current state-of-the-art repertoire-based learning algorithm RTE by solving the tasks in much less interaction times than the baselines. In our third contribution, we introduce a gradient-based meta-learning algorithm called FAMLE. FAMLE meta-trains the dynamical model of the robot from simulated data so that the model can be adapted to various unseen situations quickly with the real-world observations. By using FAMLE with a model-predictive control framework, we show that our approach outperforms several model-based and model-free learning algorithms, and solves the given tasks in less interaction time than the baselines.

Keywords

Sim to real in robotics, [INFO.INFO-AI] Computer Science [cs]/Artificial Intelligence [cs.AI], Reinforcement learning in robotics, Passage de la simulation à la réalité, Robots adaptatifs, [SPI.AUTO] Engineering Sciences [physics]/Automatic, Data-efficient robot learning, Apprentissage efficace en données pour la robotique, [INFO.INFO-RB] Computer Science [cs]/Robotics [cs.RO], Adaptive robots, Apprentissage par renforcement en robotique, [INFO.INFO-LG] Computer Science [cs]/Machine Learning [cs.LG]

  • BIP!
    Impact byBIP!
    selected citations
    These citations are derived from selected sources.
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    0
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
Powered by OpenAIRE graph
Found an issue? Give us feedback
selected citations
These citations are derived from selected sources.
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
0
Average
Average
Average
Green
Related to Research communities