引用本文: | 孔维仁,周德云,赵艺阳,杨婉莎.基于深度强化学习与自学习的多无人机近距空战机动策略生成算法[J].控制理论与应用,2022,39(2):352~362.[点击复制] |
KONG Wei-ren,ZHOU De-yun,ZHAO Yi-yang,YANG Wan-sha.Maneuvering strategy generation algorithm for multi-UAV in close-range air combat based on deep reinforcement learning and self-play[J].Control Theory and Technology,2022,39(2):352~362.[点击复制] |
|
基于深度强化学习与自学习的多无人机近距空战机动策略生成算法 |
Maneuvering strategy generation algorithm for multi-UAV in close-range air combat based on deep reinforcement learning and self-play |
摘要点击 3100 全文点击 1046 投稿时间:2021-02-03 修订日期:2021-06-07 |
查看全文 查看/发表评论 下载PDF阅读器 |
DOI编号 10.7641/CTA.2021.10120 |
2022,39(2):352-362 |
中文关键词 空战决策 多无人机协同 强化学习 虚拟自我对局 |
英文关键词 air combat decision-making multi-UAV cooperation reinforcement learning fictitious self-play |
基金项目 国家自然科学基金项目(61603299, 61612385), 中央高校基本科研业务费专项资金项目(3102019ZX016)资助 |
|
中文摘要 |
为解决多无人机近距空战机动决策问题, 提出一种基于参数共享Q网络与虚拟自我对局的多无人机近距
空战机动策略生成算法. 首先, 设计一种适用于不同无人机编队规模的混合马尔可夫博弈模型与多无人机机动决
策策略生成强化学习框架—参数共享Q网络, 并通过自编码器对状态空间进行压缩以提高策略学习效率. 然后, 使
用虚拟自我对局方法使机动策略收敛至纳什均衡策略. 最后对自编码器的参数选择、策略生成算法的训练过程与
机动策略的合理性与迁移性进行了仿真实验. 通过仿真结果表明, 引入自编码器可以有效地提高策略学习效率, 并
且使用该算法生成的多无人机近距空战机动策略具有合理性与良好的迁移性. |
英文摘要 |
In order to solve the problem of multi-UAV close-range air combat maneuvering decision-making, a multi-
UAV close-range air combat maneuvering strategy generation algorithm based on parameter sharing Q network and neural
fictitious self-play is proposed. Firstly, a hybrid Markov game model suitable for different UAV formation sizes and a
reinforcement learning framework for generating maneuvering decision strategies of multi-UAV are designed—parameter
sharing Q network, and the state space is compressed through the autoencoder to improve the efficiency of strategy learning.
Then, using the neural fictitious self-play makes the maneuver strategy converge to the Nash equilibrium strategy. Finally,
simulation experiments are carried out on the parameter selection of the autoencoder, the training process of the strategy
generation algorithm, and the rationality and portability of the maneuver strategy. The simulation results show that the
autoencoder is introduced can effectively improve the efficiency of strategy learning, and the multi-UAV short-range air
combat maneuver strategy generated by this algorithm is reasonable and good portability. |
|
|
|
|
|