引用本文: | 杨辉,王禹,李中奇,付雅婷,谭畅.专家监督的SAC强化学习重载列车运行优化控制[J].控制理论与应用,2022,39(5):799~808.[点击复制] |
YANG Hui,WANG Yu,LI Zhong-qi,FU Ya-ting,TAN Chang.Supervised SAC reinforcement learning method for heavy haul train optimization control[J].Control Theory and Technology,2022,39(5):799~808.[点击复制] |
|
专家监督的SAC强化学习重载列车运行优化控制 |
Supervised SAC reinforcement learning method for heavy haul train optimization control |
摘要点击 1715 全文点击 532 投稿时间:2021-02-10 修订日期:2022-01-10 |
查看全文 查看/发表评论 下载PDF阅读器 |
DOI编号 10.7641/CTA.2021.10132 |
2022,39(5):799-808 |
中文关键词 重载列车 强化学习 行为克隆 专家策略 |
英文关键词 heavy haul train reinforcement learning behavior clone expertise strategy |
基金项目 国家自然科学基金项目(U2034211, 62003138, 61803155), 江西省自然科学基金项目(20202BAB202005), 江西省科技专项(20203AEI009), 江西省 青年科学基金重点资助项目(20192ACBL21005)资助. |
|
中文摘要 |
重载列车是我国大宗商品运输的重要方式, 因载重大、车身长、线路复杂等因素导致重载列车的控制变得
困难. 本文将列车运行过程分为启动牵引、巡航控制、停车制动3个阶段, 基于多质点重载列车纵向动力学模型, 考
虑常用空气制动, 利用(SAC)强化学习方法, 结合循环神经网络对专家经验数据进行行为克隆, 并将克隆出的专家
策略对强化学习训练进行监督, 训练了一种新的智能驾驶操控策略. 本文的策略可以高效学习驾驶经验数据, 不断
从学习中提高目标奖励, 得到最优控制策略. 仿真结果表明: 本文所提的控制策略比未受专家模型监督的强化学习
算法更优, 奖励提升的周期更快, 并能获得更高的奖励, 训练出的控制器运行效果更加高效、稳定. |
英文摘要 |
Heavy haul train is an important transportation way of bulk commodity in our country. The control of heavy
haul train becomes difficult due to factors such as heavy load, long body length, and complex line conditions. In this
paper, the train operation process is divided into three stages: startup mode, cruise mode, and brake mode. Based on the
longitudinal dynamics model of the multi-point mass heavy haul train, the common air brake is considered, using soft actorcritic
(SAC) reinforcement learning method, combined with expert contorl strategy that trained by recurrent neural network
fitting with expertise data, which called “behavior clone”, to supervise reinforcement learning process. A new intelligent
driving control strategy is trained. The strategy in this paper can efficiently learn the driving experience data, continuously
improve the total reward from the learning, and obtain the optimal control strategy. The result of simulation shows that
the control strategy proposed in this paper is better than the reinforcement learning algorithm that is not supervised by the
expert model, the period of reward promotion is faster, higher rewards can be obtained, and the training controller operates
more efficiently and stably. |
|
|
|
|
|