引用本文:张晓楠,张建雄.随机需求车辆路径问题的价值逼近在线决策[J].控制理论与应用,2022,39(2):241~254.[点击复制]
ZHANG Xiao-nan,ZHANG Jian-xiong.Value-approximation-based online policy for vehicle routing problem with stochastic demand[J].Control Theory and Technology,2022,39(2):241~254.[点击复制]
随机需求车辆路径问题的价值逼近在线决策
Value-approximation-based online policy for vehicle routing problem with stochastic demand
摘要点击 1835  全文点击 736  投稿时间:2021-05-17  修订日期:2022-02-14
查看全文  查看/发表评论  下载PDF阅读器
DOI编号  10.7641/CTA.2021.10411
  2022,39(2):241-254
中文关键词  路径问题  随机需求  马尔可夫决策  强化学习  价值逼近算法
英文关键词  routing problems  stochastic demand  Markov decision process  reinforcement learning  value approximation iteration
基金项目  国家自然科学基金项目(71802120, 71971152), 陕西省创新能力支撑计划(2020KRM024), 陕西省教育厅专项科研计划项目(19JK0125)资助.
作者单位E-mail
张晓楠 天津大学陕西科技大学 WLxn_2010@126.com 
张建雄* 天津大学 jxzhang@tju.edu.cn 
中文摘要
      随着高效实时物流的发展, 不确定车辆路径问题面临着兼顾决策精度和实时响应能力的新挑战. 本文以应 用最为广泛的随机需求车辆路径问题为例, 研究提出一种有效的在线决策方法. 首先, 考虑多车辆同时在线, 以总旅 行成本最小化为目标, 建立马尔科夫决策模型, 并引入可信度约束和邻域半径减少策略缩小行动空间, 提高求解效 率. 其次, 设计强化学习中的价值逼近算法求解模型, 其中, 采用基函数估计期望未来成本, 并将求解过程分离为离 线训练和在线决策两个环节, 基函数的权重被离线训练并用于在线决策以减少在线决策时间, 同时, 在算法中嵌入 了邻域半径的动态更新机制. 最后, 测试多组算例验证了本文方法的有效性.
英文摘要
      With the development of effective real-time logistic, new challenges of making high-quality and real-time dynamic routing decisions have been brought to uncertain vehicle routing problem (VRP). This paper focuses on the vehicle routing problem with stochastic demand (VRPSD), a well-known uncertain VRP, and proposes an effective online method for solving it. First, considering multiple vehicles, we formulate a multi-vehicle Markov decision process (M-MDP), with the aim of minimizing the total travel cost. In the model, the credibility constraints and the neighborhood radius reduction strategy are introduced to reduce action space, which improves the efficiency. Second, we develop a reinforcement learning technology, namely value approximation iteration including offline training phase and online execution phase, to solve the model. In the method, the expected cost-to-go is estimated by a set of basis functions designed, the weight vector of basis function is trained offline to reduce online calculation time, and also, the value of neighborhood radius is dynamically updated offline. Numerical experiments show that the proposed method has good performance in both solution quality and time efficiency.