深度强化学习下的多智能体思考型半多轮通信网络

邹启杰; 汤宇; 高兵; 赵锡玲; 张哲婕

引用本文:	邹启杰,汤宇,高兵,赵锡玲,张哲婕.深度强化学习下的多智能体思考型半多轮通信网络[J].控制理论与应用,2025,42(3):553~562.[点击复制]
	ZOU Qi-jie,TANG Yu,GAO Bing,ZHAO Xi-ling,ZHANG Zhe-jie.The thinking communication network with semi-multiple communication cycles under the multi-agent deep reinforcement learning[J].Control Theory & Applications,2025,42(3):553~562.[点击复制]

深度强化学习下的多智能体思考型半多轮通信网络

The thinking communication network with semi-multiple communication cycles under the multi-agent deep reinforcement learning

摘要点击 1996 全文点击 57 投稿时间：2023-01-20 修订日期：2025-01-02

查看全文查看/发表评论下载PDF阅读器

DOI编号 10.7641/CTA.2023.30028

2025,42(3):553-562

中文关键词多智能体系统合作环境深度强化学习通信网络

英文关键词 multi-agent systems cooperative environment deep reinforcement learning communication network

基金项目国家自然科学基金项目(61673084), 2021年辽宁省教育厅项目(LJKZ1180)资助.

作者	单位	邮编
邹启杰	大连大学信息工程学院	116622
汤宇	大连大学信息工程学院
高兵^*	大连大学信息工程学院	116622
赵锡玲	大连大学信息工程学院
张哲婕	大连大学信息工程学院

中文摘要

针对多智能体系统在合作环境中通信内容单一和信息稀疏问题,本文提出一种基于多智能体深度强化学习的思考型通信网络(TMACN).首先,智能体在交互过程中考虑不同信息源的差异性,智能体将接收到的通信信息与自身历史经验信息进行融合,形成推理信息,并将此信息作为新的发送消息,从而达到提高通信内容多样化的目标;然后,该模型在软注意力机制的基础上设计了一种半多轮通信策略,提高了信息饱和度,从而提升系统的通信交互效率.本文在合作导航、捕猎任务和交通路口3个模拟环境中证明,TMACN对比其他方法,提高了系统的准确率与稳定性.

英文摘要

To address the problem of single communication content and sparse information in multi-agent systems under a cooperative environment, this paper proposes a thinking multi-agent communication network (TMACN) based on deep reinforcement learning of multi-agent. Firstly, the agent considers the differences of different information sources in the interaction process, and the agent fuses the received communication information with their own historical experience information to form inference information, and use this information as a new sent message, so as to achieve the goal of improving the diversity of communication contents. Then, the model designs a semi-multi-round communication strategy based on the soft attention mechanism, which improves the information saturation and thus enhances the communication interaction efficiency of the system. This paper demonstrates that TMACN improves the accuracy and stability of the system compared to other methods in three simulated environments: cooperative navigation, hunting task and traffic junction.