引用本文: | 韩忻辰,俞胜平,袁志明,程丽娟.基于Q-learning的高速铁路列车动态调度方法[J].控制理论与应用,2021,38(10):1511~1521.[点击复制] |
HAN Xin-chen,YU Sheng-ping,YUAN Zhi-ming,CHENG Li-juan.High-speed railway dynamic scheduling based on Q-learning method[J].Control Theory and Technology,2021,38(10):1511~1521.[点击复制] |
|
基于Q-learning的高速铁路列车动态调度方法 |
High-speed railway dynamic scheduling based on Q-learning method |
摘要点击 3075 全文点击 939 投稿时间:2020-09-10 修订日期:2021-09-09 |
查看全文 查看/发表评论 下载PDF阅读器 |
DOI编号 10.7641/CTA.2021.00612 |
2021,38(10):1511-1521 |
中文关键词 高速铁路列车 动态调度 强化学习 Q-learning |
英文关键词 high-speed railway dynamic scheduling reinforcement learning Q-learning |
基金项目 国家自然科学基金项目(U1834211, 61790574, 61603262, 61773269), 辽宁省自然科学基金(2020–MS–093)资助. |
|
中文摘要 |
高速铁路作为国家综合交通运输体系的骨干核心, 近十年来取得了飞速蓬勃的发展. 其飞速发展的同时也
引发了路网复杂化、分布区域广等现象, 这些现象对高铁动态调度提出了更高的要求. 突发事件的不确定性会对列
车造成时间延误影响, 甚者时间延误会沿路网传播, 造成大面积列车到发晚点. 而目前对于此问题的人工调度方式,
前瞻性及针对性较差, 难以对受影响列车进行迅速调整. 针对上述问题, 本文建立了以各列车在各车站延误时间总
和最小为目标函数的高速铁路列车动态调度模型, 在此基础上设计了用于与智能体交互的仿真环境, 采用了强化学
习中的Q-learning算法对模型进行求解. 最后通过仿真实例验证了仿真环境的合理性以及Q-learning算法用于高铁
动态调度的有效性, 为高铁调度员做出优化决策提供了良好的依据. |
英文摘要 |
As the backbone of the national comprehensive transportation system, high-speed railway has achieved rapid
and vigorous development in the past decade. At the same time, its rapid development has also caused the phenomena
of complicated road networks and wide distribution areas. These phenomena have placed higher requirements on highspeed
railway scheduling. Unexpected events will affect the time delay of trains, and even the delay time will spread along
the road network, causing large-area trains to arrive or departure late. However, the manual scheduling method is poorly
forward-looking and pertinent, and it is difficult to quickly adjust the affected trains. In view of the above problems, this
paper establishes a high-speed railway dynamic scheduling model with the minimum the sum of delay time as the objective
function. Based on this model, an environment for interacting with the agent is designed, and the Q-learning algorithm
is used to solve the model. Finally, the simulation examples verify the rationality of the simulation environment and the
effectiveness of the Q-learning algorithm for the dynamic scheduling problem. It can provide a good basis for dispatchers
to make more optimal decisions. |
|
|
|
|
|