引用本文:雷元龙,谢鹏,刘业华,陈翃正,朱静思,盛守照.EP-DDPG引导的着舰控制系统[J].控制理论与应用,2025,42(10):1904~1913.[点击复制]
LEI Yuan-long,XIE Peng,LIU Ye-hua,CHEN Hong-zheng,ZHU Jing-si,SHENG Shou-zhao.EP-DDPGguided carrier landing control system[J].Control Theory & Applications,2025,42(10):1904~1913.[点击复制]
EP-DDPG引导的着舰控制系统
EP-DDPGguided carrier landing control system
摘要点击 388  全文点击 60  投稿时间:2023-07-14  修订日期:2025-06-21
查看全文  查看/发表评论  下载PDF阅读器
DOI编号  10.7641/CTA.2020.90085
  2025,42(10):1904-1913
中文关键词  强化学习  深度确定性策略梯度算法  魔毯  行动者–评论家  BP神经网络
英文关键词  reinforcement learning  deep deterministic policy gradient algorithm  MAGIC CARPET  actor-critic  BP neural network
基金项目  航空科学基金项目(20220058052002)资助.
作者单位E-mail
雷元龙 南京航空航天大学自动化学院 2503050830@qq.com 
谢鹏 南京航空航天大学自动化学院  
刘业华 南京航空航天大学自动化学院  
陈翃正 南京航空航天大学自动化学院  
朱静思 南京航空航天大学自动化学院  
盛守照* 南京航空航天大学自动化学院 3286608885@qq.com 
中文摘要
      针对舰载机纵向通道下的控制精度提升问题,本文以保证舰载机以合理的姿态和速度沿期望下滑道着落 为目标,以深度确定性策略梯度算法为基本优化框架,提出了一种基于专家策略–深度确定性策略梯度(EP-DDPG) 算法的控制器参数自适应调节策略.首先,构建“魔毯”着舰控制系统作为基础架构;其次,为提升控制器的自适应能 力和鲁棒性,基于行动者–评论家框架设计深度确定性策略梯度(DDPG)算法对控制器参数进行在线调整;最后,针 对常规强化学习算法前期训练效率低,效果差的问题,基于反向传播(BP)神经网络构专家策略为智能体的训练提供 引导,并设计指导探索协调模块进行策略决策,保证动作策略的合理性和算法的高效性.仿真结果表明,与常规控 制器相比,该算法的控制精度和鲁棒性有了极大的提升.
英文摘要
      In order to improve the control accuracy of carrier-based aircraft in the longitudinal channel, this paper takes the deep deterministic policy gradient algorithm as the basic optimization framework to ensure that aircraft can land along the desired glide path with reasonable attitude and speed. An adaptive controller parameter adjustment strategy based on expert policy-deep deterministic policy gradient (EP-DDPG) algorithm is proposed. Firstly, building the MAGIC CARPET landing control system as the framework. Secondly, aiming at improving the adaptive ability and robustness of the controller, DDPG algorithm is designed based on the actor-critic framework to adjust the controller parameters online. Finally, in view of the low efficiency and poor effect of the early training of conventional reinforcement learning algorithm, an expert policy is constructed based on backward propagation (BP) neural network to provide guidance for the training of the agent, and a guidance exploration and coordination module is designed to make strategy decisions, so as to ensure the rationality of the action policy and the efficiency of the algorithm. The simulation results show that compared with the conventional controllers, the control precision and the robustness of the proposed algorithm are greatly improved.