基于Q-learning的离散时间多智能体系统一致性

朱志斌; 王付永; 尹艳辉; 刘忠信; 陈增强

引用本文:	朱志斌,王付永,尹艳辉,刘忠信,陈增强.基于Q-learning的离散时间多智能体系统一致性[J].控制理论与应用,2021,38(7):997~1005.[点击复制]
	ZHU Zhi-bin,WANG Fu-yong,YIN Yan-hui,LIU Zhong-xin,CHEN Zeng-qiang.Consensus of discrete-time multi-agent system based on Q-learning[J].Control Theory & Applications,2021,38(7):997~1005.[点击复制]

基于Q-learning的离散时间多智能体系统一致性

Consensus of discrete-time multi-agent system based on Q-learning

摘要点击 2923 全文点击 962 投稿时间：2020-08-12 修订日期：2021-02-03

查看全文查看/发表评论下载PDF阅读器

DOI编号 10.7641/CTA.2021.00533

2021,38(7):997-1005

中文关键词多智能体系统一致性离散时间 Q-learning

英文关键词 multi-agent systems consensus discrete-time Q-learning

基金项目天津市自然科学基金项目(20JCYBJC01060, 20JCQNJC01450), 国家自然科学基金项目(61973175), 南开大学中央高校基本科研业务费专项资金项目(63201196)资助.

作者	单位	E-mail
朱志斌	南开大学人工智能学院	657707375@qq.com
王付永	南开大学人工智能学院
尹艳辉	南开大学人工智能学院
刘忠信^*	南开大学人工智能学院	lzhx@nankai.edu.cn
陈增强	南开大学人工智能学院

中文摘要

针对模型未知的一类离散时间多智能体系统, 本文提出了一种Q-learning方法实现多智能体系统的一致性控制. 该方法不依赖于系统模型, 能够利用系统数据迭代求解出可使给定目标函数最小的控制律, 使所有智能体的状态实现一致. 通过各个智能体所产生的系统数据, 采用策略迭代的方法实时更新求解得到多智能体系统的控制律, 并对所提Q-learning方法进行了收敛性和稳定性分析. 最后, 论文给出了计算机仿真验证了所提方法的有效性.

英文摘要

For a class of discrete-time multi-agent systems with unknown models, a Q-learning method is proposed in this paper to achieve consensus of multi-agent systems. The proposed method does not depend on the system model, and the optimal control law can be obtained through the iteration of system data. Based on the system data, policy iteration is adopted to calculate the optimal control law of the multi-agent systems. Convergence and stability analysis of the proposed Q-learning method for multi-agent systems is also given in this work. Finally, a simulation example is provided to verify the effectiveness of the proposed method.