引用本文:孔维仁,周德云,赵艺阳,杨婉莎.基于深度强化学习与自学习的多无人机近距空战机动策略生成算法[J].控制理论与应用,2022,39(2):352~362.[点击复制]
KONG Wei-ren,ZHOU De-yun,ZHAO Yi-yang,YANG Wan-sha.Maneuvering strategy generation algorithm for multi-UAV in close-range air combat based on deep reinforcement learning and self-play[J].Control Theory and Technology,2022,39(2):352~362.[点击复制]
基于深度强化学习与自学习的多无人机近距空战机动策略生成算法
Maneuvering strategy generation algorithm for multi-UAV in close-range air combat based on deep reinforcement learning and self-play
摘要点击 3096  全文点击 1046  投稿时间:2021-02-03  修订日期:2021-06-07
查看全文  查看/发表评论  下载PDF阅读器
DOI编号  10.7641/CTA.2021.10120
  2022,39(2):352-362
中文关键词  空战决策  多无人机协同  强化学习  虚拟自我对局
英文关键词  air combat decision-making  multi-UAV cooperation  reinforcement learning  fictitious self-play
基金项目  国家自然科学基金项目(61603299, 61612385), 中央高校基本科研业务费专项资金项目(3102019ZX016)资助
作者单位E-mail
孔维仁* 西北工业大学 weirenkong@qq.com 
周德云 西北工业大学  
赵艺阳 西北工业大学  
杨婉莎 悉尼大学  
中文摘要
      为解决多无人机近距空战机动决策问题, 提出一种基于参数共享Q网络与虚拟自我对局的多无人机近距 空战机动策略生成算法. 首先, 设计一种适用于不同无人机编队规模的混合马尔可夫博弈模型与多无人机机动决 策策略生成强化学习框架—参数共享Q网络, 并通过自编码器对状态空间进行压缩以提高策略学习效率. 然后, 使 用虚拟自我对局方法使机动策略收敛至纳什均衡策略. 最后对自编码器的参数选择、策略生成算法的训练过程与 机动策略的合理性与迁移性进行了仿真实验. 通过仿真结果表明, 引入自编码器可以有效地提高策略学习效率, 并 且使用该算法生成的多无人机近距空战机动策略具有合理性与良好的迁移性.
英文摘要
      In order to solve the problem of multi-UAV close-range air combat maneuvering decision-making, a multi- UAV close-range air combat maneuvering strategy generation algorithm based on parameter sharing Q network and neural fictitious self-play is proposed. Firstly, a hybrid Markov game model suitable for different UAV formation sizes and a reinforcement learning framework for generating maneuvering decision strategies of multi-UAV are designed—parameter sharing Q network, and the state space is compressed through the autoencoder to improve the efficiency of strategy learning. Then, using the neural fictitious self-play makes the maneuver strategy converge to the Nash equilibrium strategy. Finally, simulation experiments are carried out on the parameter selection of the autoencoder, the training process of the strategy generation algorithm, and the rationality and portability of the maneuver strategy. The simulation results show that the autoencoder is introduced can effectively improve the efficiency of strategy learning, and the multi-UAV short-range air combat maneuver strategy generated by this algorithm is reasonable and good portability.