引用本文: | 池海红,周明鑫.融合强化学习和进化算法的高超声速飞行器航迹规划[J].控制理论与应用,2022,39(5):847~856.[点击复制] |
CHI Hai-hong,ZHOU Ming-xin.Trajectory planning for hypersonic vehicle combined with reinforcement learning and evolutionary algorithms[J].Control Theory and Technology,2022,39(5):847~856.[点击复制] |
|
融合强化学习和进化算法的高超声速飞行器航迹规划 |
Trajectory planning for hypersonic vehicle combined with reinforcement learning and evolutionary algorithms |
摘要点击 2092 全文点击 651 投稿时间:2021-06-03 修订日期:2022-03-17 |
查看全文 查看/发表评论 下载PDF阅读器 |
DOI编号 10.7641/CTA.2021.10478 |
2022,39(5):847-856 |
中文关键词 强化学习 深度强化学习 高超声速飞行器 航迹规划 |
英文关键词 reinforcement learning deep reinforcement learning hypersonic vehicles trajectory planning |
基金项目 国家重点研发计划项目(2018YFC0310102)资助. |
|
中文摘要 |
由于高超声速飞行器的复杂特性, 对其进行航迹规划是一项非常困难的任务. 本文针对高超声速飞行器巡
航段, 提出了一种将无模型的强化学习和交叉熵方法相结合的在线航迹规划算法. 本文将航迹规划问题建模为环境
信息缺失程度不同的马尔可夫决策过程, 利用(PPO)算法在建立的飞行环境模拟器中离线训练智能体, 并通过提高
智能体的动作在时间上的相关性来保证航迹的曲率平滑. 交叉熵方法则以已训练的智能体由观测到的状态给出的
动作作为一种先验知识, 进一步在线优化规划策略. 实验结果表明了本文的方法可以生成曲率平滑的航迹, 在复杂
的飞行环境中具有较高的成功率, 并且可以泛化到不同的飞行环境中. |
英文摘要 |
It is difficult to plan the flight trajectory for hypersonic vehicle because of its sophisticated characteristics. In
this paper, an online trajectory planning algorithm combining model-free reinforcement learning and cross-entropy method
for hypersonic vehicle in the cruise phase is proposed. The trajectory planning problem is modeled as Markov decision processes
with different degrees of missing environmental information. The agent is trained off-line in the flight environment
simulator by using proximal policy optimization (PPO) algorithm, and the curvature smoothness of the trajectory is ensured
by improving the temporal correlation of the agent’s action. The cross-entropy method uses the actions of the trained agent
from the observed state as a kind of prior knowledge to further optimize the planning policy online. Simulation results provide
the evidence that the proposed method can generate curvature smooth trajectories with high success rate in complex
flight environment, and can be generalized to different flight environments. |