引用本文:熊春萍,马倩.基于强化学习的异构多智能体系统最优输出调节[J].控制理论与应用,2025,42(3):491~498.[点击复制]
XIONG Chun-ping,MA Qian.Optimal output regulation of heterogeneous multi-agent systems via reinforcement learning[J].Control Theory and Technology,2025,42(3):491~498.[点击复制]
基于强化学习的异构多智能体系统最优输出调节
Optimal output regulation of heterogeneous multi-agent systems via reinforcement learning
摘要点击 36  全文点击 3  投稿时间:2023-02-17  修订日期:2024-08-28
查看全文  查看/发表评论  下载PDF阅读器
DOI编号  10.7641/CTA.2023.30067
  2025,42(3):491-498
中文关键词  异构多智能体系统  最优输出调节  策略迭代  无模型算法  强化学习
英文关键词  heterogeneous multi-agent systems  optimal output regulation  policy iteration  model-free algorithm  reinforcement learning
基金项目  国家自然科学基金项目(62173183), 湖北省科技计划项目(2022BBA026), 咸宁市科技计划项目(2021JBZXM02)资助.
作者单位E-mail
熊春萍 南京理工大学自动化学院 chunp_bear@163.com 
马倩* 南京理工大学自动化学院 qma@njust.edu.cn 
中文摘要
      本文研究了异构多智能体系统的最优输出调节问题.通信网络拓扑含有向生成树.首先,设计了外部系统状态补偿器和状态反馈控制器,应用图论和Lyapunov稳定性理论证明了所设计的补偿器和控制器可以解决一般输出调节问题.然后,通过最小化预定义的成本方程,解决最优输出调节问题.结合最优控制理论和强化学习技术,提出了两种求解最优控制器的算法,即基于模型的策略迭代算法和无模型off-policy算法.利用无模型算法获取最优控制器的过程既不需要求解输出调节方程也不需要使用系统动态信息.最后,通过数值仿真验证了本文所提出的算法的有效性.
英文摘要
      The optimal output regulation of heterogeneous multi-agent systems is investigated in this paper. A directed spanning tree is contained in the communication network. First of all, the exo-system state compensator and the state feedback controller are designed. Based on the graph theory and the Lyapunov stability theory, it is proved that the designed compensator and controller can achieve the general output regulation. Then, the optimal output regulation problem is worked out via minimizing a predefined cost function. Combining optimal control theory with reinforcement learning technology, two algorithms are proposed to deal with the optimal controller, which are model-based policy iteration algorithm and model-free off-policy algorithm. The process of obtaining the optimal controller by model-free algorithm does not need to solve the output regulation equation or use the information of system dynamics. Last but not least, a numerical example is proposed to verify the effectiveness of the proposed algorithms.