基于深度强化学习与自学习的多无人机近距空战机动策略生成算法

孔维仁; 周德云; 赵艺阳; 杨婉莎

引用本文:	孔维仁,周德云,赵艺阳,杨婉莎.基于深度强化学习与自学习的多无人机近距空战机动策略生成算法[J].控制理论与应用,2022,39(2):352~362.[点击复制]
	KONG Wei-ren,ZHOU De-yun,ZHAO Yi-yang,YANG Wan-sha.Maneuvering strategy generation algorithm for multi-UAV in close-range air combat based on deep reinforcement learning and self-play[J].Control Theory and Technology,2022,39(2):352~362.[点击复制]

基于深度强化学习与自学习的多无人机近距空战机动策略生成算法

Maneuvering strategy generation algorithm for multi-UAV in close-range air combat based on deep reinforcement learning and self-play

摘要点击 3235 全文点击 1200 投稿时间：2021-02-03 修订日期：2021-06-07

查看全文查看/发表评论下载PDF阅读器

DOI编号 10.7641/CTA.2021.10120

2022,39(2):352-362

中文关键词空战决策多无人机协同强化学习虚拟自我对局

英文关键词 air combat decision-making multi-UAV cooperation reinforcement learning fictitious self-play

基金项目国家自然科学基金项目(61603299, 61612385), 中央高校基本科研业务费专项资金项目(3102019ZX016)资助

作者	单位	E-mail
孔维仁^*	西北工业大学	weirenkong@qq.com
周德云	西北工业大学
赵艺阳	西北工业大学
杨婉莎	悉尼大学

中文摘要

为解决多无人机近距空战机动决策问题, 提出一种基于参数共享Q网络与虚拟自我对局的多无人机近距空战机动策略生成算法. 首先, 设计一种适用于不同无人机编队规模的混合马尔可夫博弈模型与多无人机机动决策策略生成强化学习框架—参数共享Q网络, 并通过自编码器对状态空间进行压缩以提高策略学习效率. 然后, 使用虚拟自我对局方法使机动策略收敛至纳什均衡策略. 最后对自编码器的参数选择、策略生成算法的训练过程与机动策略的合理性与迁移性进行了仿真实验. 通过仿真结果表明, 引入自编码器可以有效地提高策略学习效率, 并且使用该算法生成的多无人机近距空战机动策略具有合理性与良好的迁移性.

英文摘要

In order to solve the problem of multi-UAV close-range air combat maneuvering decision-making, a multi- UAV close-range air combat maneuvering strategy generation algorithm based on parameter sharing Q network and neural fictitious self-play is proposed. Firstly, a hybrid Markov game model suitable for different UAV formation sizes and a reinforcement learning framework for generating maneuvering decision strategies of multi-UAV are designed—parameter sharing Q network, and the state space is compressed through the autoencoder to improve the efficiency of strategy learning. Then, using the neural fictitious self-play makes the maneuver strategy converge to the Nash equilibrium strategy. Finally, simulation experiments are carried out on the parameter selection of the autoencoder, the training process of the strategy generation algorithm, and the rationality and portability of the maneuver strategy. The simulation results show that the autoencoder is introduced can effectively improve the efficiency of strategy learning, and the multi-UAV short-range air combat maneuver strategy generated by this algorithm is reasonable and good portability.