深度强化学习算法求解动态流水车间实时调度问题

杨媛媛; 胡 蓉; 钱 斌; 张长胜; 金怀平

引用本文:	杨媛媛,胡蓉,钱斌,张长胜,金怀平.深度强化学习算法求解动态流水车间实时调度问题[J].控制理论与应用,2024,41(6):1047~1055.[点击复制]
	YANG Yuan-yuan,HU Rong,QIAN Bin,ZHANG Chang-sheng,JIN Huai-ping.Deep reinforcement learning algorithm for dynamic flow shop real-time scheduling problem[J].Control Theory and Technology,2024,41(6):1047~1055.[点击复制]

深度强化学习算法求解动态流水车间实时调度问题

Deep reinforcement learning algorithm for dynamic flow shop real-time scheduling problem

摘要点击 756 全文点击 170 投稿时间：2022-10-19 修订日期：2023-05-15

查看全文查看/发表评论下载PDF阅读器

DOI编号 DOI: 10.7641/CTA.2023.20916

2024,41(6):1047-1055

中文关键词流水车间调度新工件到达深度强化学习动态实时调度智能调度

英文关键词 flow shop scheduling arrival of new jobs deep reinforcement learning dynamic real-time scheduling intelligent scheduling

基金项目国家自然科学基金项目(62173169, 61963022), 云南省基础研究重点项目(202201AS070030)资助.

作者	单位	E-mail
杨媛媛	昆明理工大学信息工程与自动化学院	yangyuanyuan0730@163.com
胡蓉^*	昆明理工大学信息工程与自动化学院	ronghu@vip.163.com
钱斌	昆明理工大学信息工程与自动化学院
张长胜	昆明理工大学信息工程与自动化学院
金怀平	昆明理工大学信息工程与自动化学院

中文摘要

本文针对动态流水车间调度问题(DFSP), 以最小化最大完工时间为优化目标, 提出一种自适应深度强化学习算法(ADRLA)进行求解. 首先, 将DFSP的新工件动态到达过程模拟为泊松过程, 进而采用马尔科夫决策过程(MDP)对DFSP的求解过程进行描述, 将DFSP转化为可由强化学习求解的序贯决策问题. 然后, 根据DFSP的排序模型特点, 设计具有较好状态特征区分度和泛化性的状态特征向量, 并依此提出5种特定动作(即调度规则)来选择当前需加工的工件, 同时构造基于问题特性的奖励函数以获取动作执行效果的评价值(即奖励值), 从而确定ADRLA的3类基本要素. 进而, 以深度双Q网络(DDQN) 作为ADRLA中的智能体, 用于进行调度决策. 该智能体采用由少量小规模DFSP确定的数据集(即3类基本要素在不同问题上的数据)训练后, 可较准确刻画不同规模DFSP的状态特征向量与Q值向量(由各动作的Q值组成)间的非线性关系, 从而能对各种规模DFSP进行自适应实时调度. 最后, 通过在不同测试问题上的仿真实验和与算法比较, 验证了所提ADRLA求解DFSP的有效性和实时性.

英文摘要

This paper aims at the dynamic flow shop scheduling problem (DFSP), an adaptive deep reinforcement learning algorithm (ADRLA) is proposed to minimize the maximum completion time of DFSP. Firstly, the solving process of DFSP is described by the Markov decision process (MDP), so as to transform the DFSP into a sequential decision problem that can be solved by reinforcement learning. Then, according to the characteristics of DFSP scheduling model, the state representation vector with good state feature discrimination and generalization is designed, and five specific actions are proposed (i.e. reward value). Furthermore, the deep double Q network (DDQN) is used as the agent in ADRLA to make scheduling decisions. After training with the data set determined by a small number of small-scale DFSPs (i.e. the data of three basic elements on different problems), the agent can accurately describe the nonlinear relationship between the state representation vector and the Q-value vector (composed of the Q-value of each action) of different scale DFSPs, so as to carry out adaptive real-time scheduling for various scale DFSPs. Finally, simulation experiments on different test problems and comparison with the algorithm verify the effectiveness and real-time performance of the proposed ADRLA in solving DFSP.