多智能体专家型策略梯度的目标跟踪与清障

孙辉辉; 胡春鹤; 张军国

引用本文:	孙辉辉,胡春鹤,张军国.多智能体专家型策略梯度的目标跟踪与清障[J].控制理论与应用,2022,39(10):1854~1864.[点击复制]
	SUN Hui-hui,Hu Chun-he,ZHANG Jun-guo.Target tracking and obstacle clearing with multi-agent expert strategy gradient[J].Control Theory and Technology,2022,39(10):1854~1864.[点击复制]

多智能体专家型策略梯度的目标跟踪与清障

Target tracking and obstacle clearing with multi-agent expert strategy gradient

摘要点击 1696 全文点击 782 投稿时间：2021-09-30 修订日期：2022-09-14

查看全文查看/发表评论下载PDF阅读器

DOI编号 10.7641/CTA.2022.10935

2022,39(10):1854-1864

中文关键词移动机器人多智能体强化学习运动规划专家策略

英文关键词 mobile robot multi-agent reinforcement learning motion planning expert strategy

基金项目国家自然科学基金 (61703047) ；河北省高等学校科学技术研究项目(QN2021312)

作者	单位	E-mail
孙辉辉	北京林业大学	cumtsunhui@126.com
胡春鹤^*	北京林业大学	北京市海淀区清华东路35号
张军国	北京林业大学

中文摘要

为适应复杂环境下目标跟踪机器人高效运动规划需求，本文提出一种基于多智能体强化学习的专家型策略梯度(ML-DDPG) 方法。为此首先构建了基于最小化任务单元的分布式多Actor-Critic网络架构；随后针对机器人主动障碍清除和目标跟踪任务建立了强化学习运动学模型和视觉样本预处理机制，由此提出一种专家型策略引导的最优目标价值估计方法；进一步通过并行化训练与集中式经验共享，提升了算法的训练效率；最后在不同任务环境下测试了ML-DDPG 算法的目标跟踪与清障性能表现，和其它算法对比验证了其在陌生环境中良好的迁移与泛化能力。

英文摘要

In order to satisfy the requirements of efficient motion planning for target tracking robot in complex environment, an novel multi-agent deep deterministic strategy gradient policy (ML-DDPG) approach is proposed based on expert knowledge. Firstly, the approach constructs a distributed multi-Actor-Critic network architecture aiming at minimizing task units, and the markov reinforcement learning kinematic model is also established for active obstacle clearing and target tracking tasks of mobile robot. Then, the visual sample preprocessing mechanism is constructed by utilizing multilayer convolutional neural network, and an optimal target value estimation method is put forward by expert strategy guiding mechanism. Based on these innovative improvements, the training efficiency of the ML-DDPG is improved through parallel training and centralized experience sharing principle. Finally, the performance indexes for obstacle clearing and target tracking are verified in different task environments based on physical simulator. Compared with state-of-the-art motion planning methods, ML-DDPG performs better migration and generalization ability in unknown environments.