引用本文:杨辉,王禹,李中奇,付雅婷,谭畅.专家监督的SAC强化学习重载列车运行优化控制[J].控制理论与应用,2022,39(5):799~808.[点击复制]
YANG Hui,WANG Yu,LI Zhong-qi,FU Ya-ting,TAN Chang.Supervised SAC reinforcement learning method for heavy haul train optimization control[J].Control Theory and Technology,2022,39(5):799~808.[点击复制]
专家监督的SAC强化学习重载列车运行优化控制
Supervised SAC reinforcement learning method for heavy haul train optimization control
摘要点击 1715  全文点击 532  投稿时间:2021-02-10  修订日期:2022-01-10
查看全文  查看/发表评论  下载PDF阅读器
DOI编号  10.7641/CTA.2021.10132
  2022,39(5):799-808
中文关键词  重载列车  强化学习  行为克隆  专家策略
英文关键词  heavy haul train  reinforcement learning  behavior clone  expertise strategy
基金项目  国家自然科学基金项目(U2034211, 62003138, 61803155), 江西省自然科学基金项目(20202BAB202005), 江西省科技专项(20203AEI009), 江西省 青年科学基金重点资助项目(20192ACBL21005)资助.
作者单位E-mail
杨辉* 华东交通大学 yhshuo@263.net 
王禹 华东交通大学  
李中奇 华东交通大学  
付雅婷 华东交通大学  
谭畅 华东交通大学  
中文摘要
      重载列车是我国大宗商品运输的重要方式, 因载重大、车身长、线路复杂等因素导致重载列车的控制变得 困难. 本文将列车运行过程分为启动牵引、巡航控制、停车制动3个阶段, 基于多质点重载列车纵向动力学模型, 考 虑常用空气制动, 利用(SAC)强化学习方法, 结合循环神经网络对专家经验数据进行行为克隆, 并将克隆出的专家 策略对强化学习训练进行监督, 训练了一种新的智能驾驶操控策略. 本文的策略可以高效学习驾驶经验数据, 不断 从学习中提高目标奖励, 得到最优控制策略. 仿真结果表明: 本文所提的控制策略比未受专家模型监督的强化学习 算法更优, 奖励提升的周期更快, 并能获得更高的奖励, 训练出的控制器运行效果更加高效、稳定.
英文摘要
      Heavy haul train is an important transportation way of bulk commodity in our country. The control of heavy haul train becomes difficult due to factors such as heavy load, long body length, and complex line conditions. In this paper, the train operation process is divided into three stages: startup mode, cruise mode, and brake mode. Based on the longitudinal dynamics model of the multi-point mass heavy haul train, the common air brake is considered, using soft actorcritic (SAC) reinforcement learning method, combined with expert contorl strategy that trained by recurrent neural network fitting with expertise data, which called “behavior clone”, to supervise reinforcement learning process. A new intelligent driving control strategy is trained. The strategy in this paper can efficiently learn the driving experience data, continuously improve the total reward from the learning, and obtain the optimal control strategy. The result of simulation shows that the control strategy proposed in this paper is better than the reinforcement learning algorithm that is not supervised by the expert model, the period of reward promotion is faster, higher rewards can be obtained, and the training controller operates more efficiently and stably.