基于深度强化学习的舰船导弹目标分配方法

肖友刚; 金升成; 毛晓; 伍国华; 陆志沣

引用本文:	肖友刚,金升成,毛晓,伍国华,陆志沣.基于深度强化学习的舰船导弹目标分配方法[J].控制理论与应用,2024,41(6):990~998.[点击复制]
	XIAO You-gang,JIN Sheng-cheng,MAO Xiao,WU Guo-hua,LU Zhi-feng.Missile-target assignment method of naval ship based on deep reinforcement learning[J].Control Theory and Technology,2024,41(6):990~998.[点击复制]

基于深度强化学习的舰船导弹目标分配方法

Missile-target assignment method of naval ship based on deep reinforcement learning

摘要点击 1097 全文点击 215 投稿时间：2022-08-05 修订日期：2024-05-19

查看全文查看/发表评论下载PDF阅读器

DOI编号 DOI: 10.7641/CTA.2023.20696

2024,41(6):990-998

中文关键词防空反导导弹目标分配武器目标分配深度强化学习

英文关键词 air defense and anti-missile missile-target allocation weapon-target allocation deep reinforcement learning

基金项目

作者	单位	E-mail
肖友刚	中南大学交通运输工程学院	csuxyg@csu.edu.cn
金升成	中南大学交通运输工程学院
毛晓	中南大学交通运输工程学院
伍国华^*	中南大学交通运输工程学院	guohuawu@csu.edu.cn
陆志沣	上海机电工程研究所

中文摘要

针对对抗环境下的海上舰船防空反导导弹目标分配问题, 本文提出了一种融合注意力机制的深度强化学习算法. 首先, 构建了舰船多类型导弹目标分配模型, 并结合目标多波次拦截特点将问题建模为马尔可夫决策过程.接着, 基于编码器–解码器框架搭建强化学习策略网络, 融合多头注意力机制对目标进行编码, 并在解码中结合整体目标和单个目标编码信息实现舰船可靠的导弹目标分配. 最后, 对导弹目标分配收益、分配时效以及策略网络训练过程进行了仿真实验. 实验结果表明, 本文方法能生成高收益的导弹目标分配方案, 相较于对比算法的大规模决策计算速度提高10%～94%, 同时其策略网络能够快速稳定收敛.

英文摘要

To effectively solve the missile-target allocation problem of the naval ship in the case of confrontation, this study proposes a deep reinforcement learning algorithm combining attention mechanism. First, we construct a mathematical model for multi-type missiles of the naval ship and design the Markov decision-making process considering the situation of multi-times target interception. After that, the policy network is constructed based on the encoder-decoder architecture, in which targets are encoded combined with the multi-head attention mechanism and the reasonable missile-target allocation scheme is generated in the decoder according to integrated global and local embedding information. Finally, we conduct simulation experiments are carried out on the profit of missile-target allocation schemes, computation time, and the training process of the policy network. The experimental results show that our algorithm can engender missile-target allocation schemes with higher profit compared to baselines, the computation time in large-scale problems is reduced by 10%～94%, and it converges fast and stably.