引用本文: | 罗佳,李朝锋.基于残差图卷积网络与深度强化学习的 需求可拆分车辆路径优化算法[J].控制理论与应用,2024,41(6):1123~1136.[点击复制] |
LUO Jia,LI Cchao-feng.The split delivery vehicle routing optimization with the residual graph convolutional network and deep reinforcement learning[J].Control Theory and Technology,2024,41(6):1123~1136.[点击复制] |
|
基于残差图卷积网络与深度强化学习的 需求可拆分车辆路径优化算法 |
The split delivery vehicle routing optimization with the residual graph convolutional network and deep reinforcement learning |
摘要点击 695 全文点击 164 投稿时间:2022-11-29 修订日期:2024-04-18 |
查看全文 查看/发表评论 下载PDF阅读器 |
DOI编号 DOI: 10.7641/CTA.2023.21040 |
2024,41(6):1123-1136 |
中文关键词 需求可拆分车辆路径问题 残差图卷积神经网络 注意力机制 深度强化学习 |
英文关键词 the split delivery vehicle routing problem the residual graph convolutional network the attention mechanism deep reinforcement learning |
基金项目 国家自然科学基金项目(62176150)资助. |
|
中文摘要 |
需求可拆分车辆路径问题(SDVRP)出现在广泛的物流配送场景中, 具有重要的研究价值. 高效的SDVRP优化算法能够提高车辆装载率, 降低物流配送成本. 为提高SDVRP的求解效率, 本文提出基于残差图卷积神经网络(RGCN)和多头注意力的深度强化学习算法(REINFORCE), 逐步构建可行解序列. 首先, 从强化学习的角度出发, 文章对SDVRP建立马尔科夫决策模型, 定义序列预测过程的环境状态、智能体动作空间、状态转移函数等. 其次, 建立编–解码模型求解节点选择策略, 其中使用残差图卷积神经网络的编码器重构配送中心和客户节点的特征, 将配送网络中节点间的连接关系与节点特征相互关联, 获得差异性显著的特征嵌入向量; 利用注意力网络解码器在重构后的嵌入向量基础上融合动态变化的车辆剩余装载量和客户需求等信息执行解码任务, 实现每次迭代为单个案例提供多个可行解. 最后, 提出基于平均基准值的REINFORCE算法更新模型参数, 通过求解不同问题规模测试集、标准SDVRP数据集, 以及京东物流实际配送任务, 验证了所提算法的有效性. |
英文摘要 |
The split delivery vehicle routing problem (SDVRP) occurred in most delivery tasks is of great significance. Efficient optimization algorithms can maximize loading space and reduce distribution cost. To improve the performance of the SDVRP optimization algorithms, we propose a deep reinforcement learning algorithm based on the residual graph convolutional network and multi-head attention to construct the sequence incrementally. Specifically, we firstly build the Markov decision process model for the SDVRP, such as the environment state, action space and transition function of generating sequence. Secondly, an encoder-decoder network is proposed to represent the stochastic policy to select the node. The residual graph convolutional network takes the relationship between nodes and node features into account to generate powerful embeddings. The attention mechanism is utilized to execute the decoding task based on the embeddings fused the remaining vehicle capacity and customer demands, which can produce multiple solutions for any instance. Thirdly, the parameters of the proposed model are updated by the improved REINFORCE algorithm based on the average baseline. Through experiments using the synthetical datasets with variable problem scales, standard SDVRP benchmark and Jingdong logistics tasks, the results validate the performance of the proposed algorithm. |
|
|
|
|
|