引用本文: | 孙懿豪,闫超,相晓嘉,唐邓清,周晗,姜杰.基于分层强化学习的多无人机协同围捕方法[J].控制理论与应用,2025,42(1):96~108.[点击复制] |
SUN Yi-hao,YAN Chao,XIANG Xiao-jia,TANG Deng-qing,ZHOU Han,JIANG Jie.Multi-UAV collaborative pursuit method via hierarchical reinforcement learning[J].Control Theory and Technology,2025,42(1):96~108.[点击复制] |
|
基于分层强化学习的多无人机协同围捕方法 |
Multi-UAV collaborative pursuit method via hierarchical reinforcement learning |
摘要点击 2695 全文点击 32 投稿时间:2023-06-26 修订日期:2024-10-31 |
查看全文 查看/发表评论 下载PDF阅读器 |
DOI编号 10.7641/CTA.2024.30439 |
2025,42(1):96-108 |
中文关键词 分层强化学习 避障 避碰 多无人机围捕 |
英文关键词 hierarchical reinforcement learning obstacle avoidance collision avoidance multi-UAV pursuit |
基金项目 国家自然科学基金项目(62403240), 江苏省自然科学基金项目(BK20241396), 湖南省研究生科研创新项目(CX20240114)资助. |
|
中文摘要 |
针对复杂障碍环境下的动态目标围捕问题, 本文提出一种基于分层强化学习的多无人机协同围捕方法. 该方法包含两个层级的学习过程: 底层的子策略学习和高层的子策略切换. 具体而言, 将协同围捕任务分解为导航避障和导航避碰两个子任务, 独立学习相应的底层子策略, 分别赋予无人机协同围捕目标时所需的避障与避碰技能. 在此基础上, 设计带有切换惩罚的稀疏回报函数训练高层的子策略切换模块, 避免了对人工定义规则的依赖, 实现了底层技能的自动组合. 数值仿真与软件在环实验结果表明, 所提方法能够显著降低围捕策略的学习难度, 相较于基线方法具有最高的围捕成功率. |
英文摘要 |
Aiming at the dynamic target pursuit problem in the complex obstacle environment, a multi-UAV collaborative pursuit method via hierarchical reinforcement learning is proposed. This method contains two levels of learning process: the low-level sub-policy learning and the high-level sub-policy switching. Specifically, the collaborative pursuit task is decomposed into two sub-tasks, navigation obstacle avoidance and navigation collision avoidance. The corresponding sub-policies are learned independently to endow the UAV with skills of obstacle avoidance and collision avoidance required for collaborative pursuit. On this basis, a sparse reward function with a switching penalty is designed to train the high-level sub-policy switching module, which avoids the dependence on manually defined rules and realizes the automatic combination of underlying skills. Results of numerical simulation and software-in-the-loop experiments show that the proposed method can significantly reduce the learning difficulty of the pursuit policy, and has the highest success rate of pursuit compared with the baseline methods. |
|
|
|
|
|