引用本文:池海红,周明鑫.融合强化学习和进化算法的高超声速飞行器航迹规划[J].控制理论与应用,2022,39(5):847~856.[点击复制]
CHI Hai-hong,ZHOU Ming-xin.Trajectory planning for hypersonic vehicle combined with reinforcement learning and evolutionary algorithms[J].Control Theory and Technology,2022,39(5):847~856.[点击复制]
融合强化学习和进化算法的高超声速飞行器航迹规划
Trajectory planning for hypersonic vehicle combined with reinforcement learning and evolutionary algorithms
摘要点击 2092  全文点击 651  投稿时间:2021-06-03  修订日期:2022-03-17
查看全文  查看/发表评论  下载PDF阅读器
DOI编号  10.7641/CTA.2021.10478
  2022,39(5):847-856
中文关键词  强化学习  深度强化学习  高超声速飞行器  航迹规划
英文关键词  reinforcement learning  deep reinforcement learning  hypersonic vehicles  trajectory planning
基金项目  国家重点研发计划项目(2018YFC0310102)资助.
作者单位E-mail
池海红 哈尔滨工程大学 chi_hon@hrbeu.edu.cn 
周明鑫* 哈尔滨工程大学 1147596768@qq.com 
中文摘要
      由于高超声速飞行器的复杂特性, 对其进行航迹规划是一项非常困难的任务. 本文针对高超声速飞行器巡 航段, 提出了一种将无模型的强化学习和交叉熵方法相结合的在线航迹规划算法. 本文将航迹规划问题建模为环境 信息缺失程度不同的马尔可夫决策过程, 利用(PPO)算法在建立的飞行环境模拟器中离线训练智能体, 并通过提高 智能体的动作在时间上的相关性来保证航迹的曲率平滑. 交叉熵方法则以已训练的智能体由观测到的状态给出的 动作作为一种先验知识, 进一步在线优化规划策略. 实验结果表明了本文的方法可以生成曲率平滑的航迹, 在复杂 的飞行环境中具有较高的成功率, 并且可以泛化到不同的飞行环境中.
英文摘要
      It is difficult to plan the flight trajectory for hypersonic vehicle because of its sophisticated characteristics. In this paper, an online trajectory planning algorithm combining model-free reinforcement learning and cross-entropy method for hypersonic vehicle in the cruise phase is proposed. The trajectory planning problem is modeled as Markov decision processes with different degrees of missing environmental information. The agent is trained off-line in the flight environment simulator by using proximal policy optimization (PPO) algorithm, and the curvature smoothness of the trajectory is ensured by improving the temporal correlation of the agent’s action. The cross-entropy method uses the actions of the trained agent from the observed state as a kind of prior knowledge to further optimize the planning policy online. Simulation results provide the evidence that the proposed method can generate curvature smooth trajectories with high success rate in complex flight environment, and can be generalized to different flight environments.