Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ HAL-INSA Toulousearrow_drop_down
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
HAL-INSA Toulouse
Conference object . 2025
Data sources: HAL-INSA Toulouse
image/svg+xml Jakob Voss, based on art designer at PLoS, modified by Wikipedia users Nina and Beao Closed Access logo, derived from PLoS Open Access logo. This version with transparent background. http://commons.wikimedia.org/wiki/File:Closed_Access_logo_transparent.svg Jakob Voss, based on art designer at PLoS, modified by Wikipedia users Nina and Beao
https://doi.org/10.1109/icra55...
Article . 2025 . Peer-reviewed
License: STM Policy #29
Data sources: Crossref
versions View all 2 versions
addClaim

This Research product is the result of merged Research products in OpenAIRE.

You have already added 0 works in your ORCID record related to the merged Research product.

Optimizing Complex Control Systems with Differentiable Simulators: A Hybrid Approach to Reinforcement Learning and Trajectory Planning

Authors: Parag, Amit; Mansard, Nicolas; Misimi, Ekrem;

Optimizing Complex Control Systems with Differentiable Simulators: A Hybrid Approach to Reinforcement Learning and Trajectory Planning

Abstract

L'apprentissage par renforcement profond (RL) s'appuie souvent sur des simulateurs comme oracles abstraits pour modéliser les interactions au sein d'environnements complexes. Bien que des simulateurs différentiables aient récemment émergé pour les systèmes robotiques multi-corps, ils restent sous-utilisés, malgré leur potentiel à fournir des informations plus riches. Cette sous-utilisation, conjuguée au coût de calcul élevé de l'exploration-exploitation dans des espaces d'état de grande dimension, limite l'application pratique de l'RL en situation réelle. Nous proposons une méthode intégrant l'apprentissage à des simulateurs différentiables afin d'améliorer l'efficacité de l'exploration-exploitation. Notre approche apprend des fonctions de valeur, des trajectoires d'état et des politiques de contrôle à partir d'exécutions localement optimales d'un optimiseur de trajectoire basé sur un modèle. La fonction de valeur apprise agit comme un proxy pour raccourcir l'horizon de prévisualisation, tandis que les politiques d'état et de contrôle approximatives guident l'optimisation de la trajectoire. Nous comparons notre algorithme à trois problèmes de contrôle classiques et à un bras manipulateur robotique à 7 degrés de liberté contrôlé par couple, démontrant une convergence plus rapide et une relation symbiotique plus efficace entre apprentissage et simulation pour l'apprentissage complet de systèmes complexes et polyarticulés.

Deep reinforcement learning (RL) often relies on simulators as abstract oracles to model interactions within complex environments. While differentiable simulators have recently emerged for multi-body robotic systems, they remain underutilized, despite their potential to provide richer information. This underutilization, coupled with the high computational cost of exploration-exploitation in high-dimensional state spaces, limits the practical application of RL in the real-world. We propose a method that integrates learning with differentiable simulators to enhance the efficiency of exploration-exploitation. Our approach learns value functions, state trajectories, and control policies from locally optimal runs of a model-based trajectory optimizer. The learned value function acts as a proxy to shorten the preview horizon, while approximated state and control policies guide the trajectory optimization. We benchmark our algorithm on three classical control problems and a torque-controlled 7 degree-of-freedom robot manipulator arm, demonstrating faster convergence and a more efficient symbiotic relationship between learning and simulation for end-to-end training of complex, poly-articulated systems.

Keywords

[INFO.INFO-RB] Computer Science [cs]/Robotics [cs.RO], Value function, Reinforcement Leaning RL, Robotics, [STAT.ML] Statistics [stat]/Machine Learning [stat.ML]

  • BIP!
    Impact byBIP!
    selected citations
    These citations are derived from selected sources.
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    0
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
Powered by OpenAIRE graph
Found an issue? Give us feedback
selected citations
These citations are derived from selected sources.
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
0
Average
Average
Average
Green