引用本文:颜罡,赵斐然,叶锋,吴俊博,游科友.基于元强化学习的自动列车定速控制[J].控制理论与应用,2022,39(10):1807~1814.[点击复制]
YAN Gang,ZHAO Fei-ran,YE Feng,WU Jun-bo,YOU Ke-you.Meta-reinforcement learning based velocity regulation for automatic train operation[J].Control Theory and Technology,2022,39(10):1807~1814.[点击复制]
基于元强化学习的自动列车定速控制
Meta-reinforcement learning based velocity regulation for automatic train operation
摘要点击 2036  全文点击 483  投稿时间:2021-07-06  修订日期:2022-05-06
查看全文  查看/发表评论  下载PDF阅读器
DOI编号  10.7641/CTA.2022.10595
  2022,39(10):1807-1814
中文关键词  定速控制,马尔可夫过程,强化学习,元学习,神经网络
英文关键词  Velocity regulation, Markov decision procession, Reinforcement learning, Meta-learning, Neural network
基金项目  国家自然科学基金重点项目
作者单位E-mail
颜罡 中车株洲电力机车有限公司和大功率交流传动电力机车系统集成国家重点实验室 yan.x.gang@163.com 
赵斐然 清华大学自动化系  
叶锋 中车株洲电力机车有限公司  
吴俊博 中车株洲电力机车有限公司  
游科友* 清华大学自动化系 youky@tsinghua.edu.cn 
中文摘要
      本文考虑自动列车在路况变化下的定速控制问题. 由于铁路路况的复杂以及列车动力学的不确定性, 基于模型的控制器难以稳定、快速、精确地进行定速控制. 我们提出了一种无模型控制器, 其只需要很少的列车运行数据即可适应新的路况. 首先, 我们将列车的定速控制问题建模为一系列转移概率未知的静态连续马尔可夫过程. 然后, 我们应用元强化学习去求解该马尔可夫过程, 得到自适应神经网络控制器. 仿真说明该无模型控制器能够高效地进行定速控制, 并能迅速适应新的环境, 同时满足系统约束.
英文摘要
      This paper considers the velocity regulation problem for the automatic train operation system under time-variant railway conditions. Due to complex environment and uncertainites in system dynamics, this problem cannot be well solved by most model-based controllers. To this end, we propose a model-free controller, which only requires a ``small'' amount of data to adapt to the new railway condition. First, we formulate the velocity regulation problem for the automatic train as a sequence of stationary and continuous Markov decision processes (MDPs) with unknown transition probabilities. Then, we adopt the meta-reinforcement learning framework to solve the MDPs and to train an initial neural-network controller, which is able to adapt to new environment quickly using observed data. Finally, We illustrate via simulations that our model-free controller can regulate the train to the desired velocity and well adapt to the time-variant railway conditions, while satisfying the constraints in the dyamical system. Moreover, the experiments also show the robustness of our controller under uncertain dynamics.