策略梯度的超启发算法求解带容量约束车辆路径问题

张景玲; 余孟凡; 赵燕伟; 孙钰粟; 蒋玉勇

引用本文:	张景玲,余孟凡,赵燕伟,孙钰粟,蒋玉勇.策略梯度的超启发算法求解带容量约束车辆路径问题[J].控制理论与应用,2024,41(6):1111~1122.[点击复制]
	ZHANG Jing-ling,YU Meng-fan,ZHAO Yan-wei,SUN Yu-su,JIANG Yu-yong.Hyper-heuristic for the capacitated vehicle routing problem with policy gradient[J].Control Theory & Applications,2024,41(6):1111~1122.[点击复制]

策略梯度的超启发算法求解带容量约束车辆路径问题

Hyper-heuristic for the capacitated vehicle routing problem with policy gradient

摘要点击 929 全文点击 197 投稿时间：2022-07-18 修订日期：2023-09-27

查看全文查看/发表评论下载PDF阅读器

DOI编号 DOI: 10.7641/CTA.2023.20642

2024,41(6):1111-1122

中文关键词车辆路径问题强化学习关策略梯度算法神经网络超启发算法

英文关键词 vehicle routing problem reinforcement learning policy gradient neural networks hyper-heuristic

基金项目国家自然科学基金项目(61402409), 浙江省自然科学基金项目(LY19F030017)资助.

作者	单位	E-mail
张景玲^*	浙江工业大学	jlzhang@zjut.edu.cn
余孟凡	浙江工业大学
赵燕伟	浙江工业大学
孙钰粟	浙江工业大学
蒋玉勇	浙江工业大学

中文摘要

有容量车辆路径问题是组合优化问题中比较热门的问题, 它属于经典的NP-hard问题并且时间复杂度高.本文提出了一种基于策略梯度的超启发算法, 将强化学习中的确定性策略梯度算法引入到超启发算法的高层策略中的底层算法选择策略, 确定性策略梯度算法采用Actor-Critic框架, 另外为了能够在后续计算和神经网络参数更新中引用历史经验数据, 在确定性策略梯度算法中设计了经验池用于存储状态转移数据. 在超启发算法解的接受准则方面, 文中通过实验对比了3种接受准则的效果, 最终选择了自适应接受准则作为高层策略中解的接受准则. 通过对有容量车辆路径问题标准算例的计算, 并将求解结果与其他算法对比, 验证了所提算法在该问题求解上的有效性和稳定性.

英文摘要

The capacitated vehicle routing problem is popular in combinatorial optimization. It is a classic NP-hard problem with high time complexity. This paper proposed a hyper-heuristic algorithm based on policy gradient. The deterministic policy gradient algorithm in reinforcement learning is introduced into the low-level algorithm selection strategy in the high-level heuristic strategy of the hyper-heuristic algorithm. The deterministic policy gradient algorithm adopts the Actor-Critic framework. In addition, to reference historical experience data in subsequent calculations and parameter updates of neural networks, the experience pool is designed to store state transition data in a deterministic policy gradient algorithm. In terms of the acceptance criteria of the hyper-heuristic algorithm, the paper compared the effects of the three acceptance criteria through experiments, and finally, the adaptive acceptance criterion is chosen as the acceptance criterion in the high-level heuristic strategy. The effectiveness and stability of the proposed algorithm in solving the capacitated vehicle routing problem are verified by calculating the standard example and comparing with the results of other algorithms.