Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ IEEE Accessarrow_drop_down
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
IEEE Access
Article . 2020 . Peer-reviewed
License: CC BY
Data sources: Crossref
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
IEEE Access
Article
License: CC BY NC ND
Data sources: UnpayWall
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
IEEE Access
Article . 2020
Data sources: DOAJ
https://dx.doi.org/10.60692/g8...
Other literature type . 2020
Data sources: Datacite
https://dx.doi.org/10.60692/xk...
Other literature type . 2020
Data sources: Datacite
versions View all 4 versions
addClaim

This Research product is the result of merged Research products in OpenAIRE.

You have already added 0 works in your ORCID record related to the merged Research product.

A Multi-Critic Reinforcement Learning Method: An Application to Multi-Tank Water Systems

طريقة تعلم التعزيز متعدد الحرج: تطبيق على أنظمة المياه متعددة الخزانات
Authors: Juan Martinez-Piazuelo; Daniel E. Ochoa; Nicanor Quijano; Luis Felipe Giraldo;

A Multi-Critic Reinforcement Learning Method: An Application to Multi-Tank Water Systems

Abstract

Cet article étudie la combinaison de l'apprentissage par renforcement et des réseaux neuronaux appliquée au contrôle piloté par les données des systèmes dynamiques. En particulier, nous proposons une architecture multi-critique acteur-critique qui facilite la tâche d'apprentissage de la fonction valeur en la distribuant dans plusieurs réseaux de neurones. Nous proposons également une approche multi-critique filtrée qui offre d'autres améliorations de performance car elle facilite le processus de formation de la politique de contrôle. Toutes les méthodes étudiées sont évaluées avec plusieurs expériences numériques sur des systèmes d'eau à réservoirs multiples avec une dynamique couplée non linéaire, où le contrôle est connu pour être une tâche difficile. Les résultats de la simulation montrent que le schéma multi-critique proposé est capable de surpasser l'approche standard acteur-critique en termes de rapidité et de sensibilité du processus d'apprentissage. De plus, les résultats montrent que la stratégie multi-critique filtrée surpasse la stratégie non filtrée dans ces mêmes conditions. Ce document met en évidence les avantages de la méthodologie multi-critique sur un algorithme d'apprentissage par renforcement de pointe, le gradient de politique déterministe profond, et démontre son application aux systèmes d'eau à réservoirs multiples pertinents pour le contrôle des processus industriels.

Este documento investiga la combinación del aprendizaje por refuerzo y las redes neuronales aplicadas al control basado en datos de sistemas dinámicos. En particular, proponemos una arquitectura actor-crítico multicrítica que facilita la tarea de aprendizaje de la función de valor al distribuirla en múltiples redes neuronales. También proponemos un enfoque multicrítico filtrado que ofrece mejoras adicionales en el rendimiento, ya que facilita el proceso de capacitación de la política de control. Todos los métodos estudiados se evalúan con varios experimentos numéricos en sistemas de agua multitanque con dinámica acoplada no lineal, donde se sabe que el control es una tarea desafiante. Los resultados de la simulación muestran que el esquema multicrítico propuesto es capaz de superar el enfoque estándar actor-crítico en términos de velocidad y sensibilidad del proceso de aprendizaje. Además, los resultados muestran que la estrategia multicrítica filtrada supera a la no filtrada en estos mismos términos. Este documento destaca los beneficios de la metodología multicrítica en un algoritmo de aprendizaje de refuerzo de última generación, el gradiente de política determinista profunda, y demuestra su aplicación a sistemas de agua de múltiples tanques relevantes para el control de procesos industriales.

This paper investigates the combination of reinforcement learning and neural networks applied to the data-driven control of dynamical systems. In particular, we propose a multi-critic actor-critic architecture that eases the value function learning task by distributing it into multiple neural networks. We also propose a filtered multi-critic approach that offers further performance improvements as it eases the training process of the control policy. All the studied methods are evaluated with several numerical experiments on multi-tank water systems with nonlinear coupled dynamics, where control is known to be a challenging task. The simulation results show that the proposed multi-critic scheme is able to outperform the standard actor-critic approach in terms of speed and sensitivity of the learning process. Moreover, the results show that the filtered multi-critic strategy outperforms the unfiltered one under these same terms. This document highlights the benefits of the multi-critic methodology on a state of the art reinforcement learning algorithm, the deep deterministic policy gradient, and demonstrates its application to multi-tank water systems relevant for industrial process control.

تبحث هذه الورقة في مزيج من التعلم المعزز والشبكات العصبية المطبقة على التحكم القائم على البيانات في الأنظمة الديناميكية. على وجه الخصوص، نقترح بنية فاعلة وناقدة متعددة النقد تسهل مهمة تعلم وظيفة القيمة من خلال توزيعها على شبكات عصبية متعددة. نقترح أيضًا نهجًا مرشحًا متعدد النقد يوفر مزيدًا من تحسينات الأداء لأنه يسهل عملية التدريب لسياسة التحكم. يتم تقييم جميع الطرق المدروسة من خلال العديد من التجارب العددية على أنظمة المياه متعددة الخزانات مع ديناميكيات مقترنة غير خطية، حيث من المعروف أن التحكم مهمة صعبة. تُظهر نتائج المحاكاة أن المخطط متعدد النقد المقترح قادر على التفوق على النهج القياسي للنقد من حيث سرعة وحساسية عملية التعلم. علاوة على ذلك، تظهر النتائج أن الاستراتيجية متعددة النقد التي تمت تصفيتها تتفوق على الاستراتيجية التي لم تتم تصفيتها بموجب هذه الشروط نفسها. تسلط هذه الوثيقة الضوء على فوائد المنهجية متعددة النقد في خوارزمية التعلم المعزز الحديثة، وتدرج السياسة الحتمية العميقة، وتوضح تطبيقها على أنظمة المياه متعددة الخزانات ذات الصلة بالتحكم في العمليات الصناعية.

Keywords

Artificial neural network, reinforcement learning, actor-critic methods, Artificial intelligence, Reinforcement Learning Algorithms, Quantum mechanics, Multi-Agent Systems, Systems engineering, Task (project management), Engineering, deep deterministic policy gradient, Artificial Intelligence, Actor-Critic Algorithm, Reinforcement learning, Machine learning, FOS: Mathematics, Adaptive Dynamic Programming, Data-driven control, Physics, Mathematical optimization, water-tank systems, Reinforcement Learning, Computer science, TK1-9971, Process (computing), Optimal control, Operating system, Computational Theory and Mathematics, Adaptive Dynamic Programming for Optimal Control, Computer Science, Physical Sciences, Nonlinear system, Electrical engineering. Electronics. Nuclear engineering, approximate dynamic programming, Mathematics

  • BIP!
    Impact byBIP!
    selected citations
    These citations are derived from selected sources.
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    17
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Top 10%
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Top 10%
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Top 10%
Powered by OpenAIRE graph
Found an issue? Give us feedback
selected citations
These citations are derived from selected sources.
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
17
Top 10%
Top 10%
Top 10%
gold