Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ UPCommons. Portal de...arrow_drop_down
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
Recolector de Ciencia Abierta, RECOLECTA
Bachelor thesis . 2024
License: CC BY NC ND
versions View all 2 versions
addClaim

This Research product is the result of merged Research products in OpenAIRE.

You have already added 0 works in your ORCID record related to the merged Research product.

Robust bipedal locomotion through an MPC-based residual learning framework

Authors: Arribalzaga Jové, Carlos;

Robust bipedal locomotion through an MPC-based residual learning framework

Abstract

Uno de los rasgos más característicos de los humanoides es el bipedismo, sin embargo lograr una locomoción robusta sigue siendo uno de los problemas más desafiantes de la robótica. Las dificultades son debidas a la altra dimensionalidad y complejidad la dinámica híbrida, combinada con las restricciones computacionales. Normalmente se emplean modelos de orden reducido para abordar este problema, sin embargo, estos modelos no logran capturar la dinámica completa del robot. En este trabajo, proponemos un marco jerárquico que combina una planificación de tareas de alto nivel, compuesta por un controlador predictivo basado en modelos (MPC) y un término residual aprendido mediante aprendizaje por refuerzo, con un whole-body controller en un nivel inferior para seguir las trayectorias deseadas en el espacio de tareas. El MPC utiliza un modelo de orden reducido para generar una política de pasos subóptima, mejorada posteriormente por la política residual, que tiene en cuenta la dinámica de todo el cuerpo del robot. El MPC actuará como guía durante el proceso de entrenamiento, facilitando un aprendizaje eficiente. El marco propuesto se prueba en simulación en tres escenarios distintos: caminar hacia delante y hacia atrás, girar, y caminar bajo fuerzas externas. Demostrando que la nueva política es capaz de mejorar y generar una locomoción robusta.

El bipedisme és un dels trets més característics dels humanoides, però aconseguir una locomoció robusta continua sent un dels problemes més difícils de la robòtica. Les dificultats sorgeixen de la complexitat de la dinàmica híbrida d'alta dimensió, combinada amb les limitacions computacionals. Normalment s'utilitzen models d'ordre reduït per abordar aquest problema, però aquests models no capturen la dinàmica completa del robot. En aquest treball, proposem un marc jeràrquic que combina una política de planificació d'espais de tasques d'alt nivell, composta per un controlador predictiu basat en models (MPC) i un terme residual après mitjançant l'aprenentatge per reforç, amb un whole-body controller a un nivell inferior, per fer seguir de les trajectòries de l'espai de tasques desitjades. L'MPC utilitza un model d'ordre reduït per generar una política de pas subòptima, que es millora posteriorment per la política residual, que té en compte la dinàmica del cos sencer del robot. L'MPC actuarà com a guia durant el procés d'entrenament, facilitant un aprenentatge eficient. El marc proposat es prova en simulació en tres escenaris diferents, com ara caminar cap endavant i enrere, girar, i caminar sota forces externes. Demostrant que la nova política és capaç de millorar i generar una locomoció robusta.

Bipedal walking is one of the most characteristic features of humanoids, yet achieving a robust locomotion remains a challenging problem in robotics. The difficulties arise from the complexity of high-dimensional hybrid dynamics, combined with real-time and computational constraints. Reduced-order models are typically employed to address this problem, however, these models do not fully capture the dynamics of the robot. In this work, we propose a hierarchical framework that combines a high-level task space planner policy, composed of a model-based model predictive controller (MPC) and a residual term learned through reinforcement learning, with a lower-level whole-body controller to track the desired task space trajectories. The MPC uses a reduced-order model to generate a suboptimal footstep policy, which is subsequently improved by the residual policy, which takes into account the full-body dynamics of the robot. The MPC will act as a guide during the training process, facilitating efficient learning. The proposed framework is tested on simulation across three distinct scenarios such as forward and backward walking, turning, and walking under external forces, showing that the new policy is able to improve and generate robust locomotion.

Outgoing

Keywords

reinforcement learning, Robòtica, Àrees temàtiques de la UPC::Matemàtiques i estadística, model predictive control, reduced-order model, Àrees temàtiques de la UPC::Física, whole-body control, humanoid, Robotics, deep reinforcement learning in robotics, angular momentum, robust policy, bipedal locomotion, walking, hierarchical framework, Classificació AMS::68 Computer science::68T Artificial intelligence, task space learning, Reinforcement learning, Aprenentatge per reforç, Classificació AMS::70 Mechanics of particles and systems::70Q05 Control of mechanical systems, Markov decision process, inverted pendulum, residual learning

  • BIP!
    Impact byBIP!
    selected citations
    These citations are derived from selected sources.
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    0
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
    OpenAIRE UsageCounts
    Usage byUsageCounts
    visibility views 117
    download downloads 142
  • 117
    views
    142
    downloads
    Powered byOpenAIRE UsageCounts
Powered by OpenAIRE graph
Found an issue? Give us feedback
visibility
download
selected citations
These citations are derived from selected sources.
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
views
OpenAIRE UsageCountsViews provided by UsageCounts
downloads
OpenAIRE UsageCountsDownloads provided by UsageCounts
0
Average
Average
Average
117
142
Green