引用本文:杨媛媛,胡 蓉,钱 斌,张长胜,金怀平.深度强化学习算法求解动态流水车间实时调度问题[J].控制理论与应用,2024,41(6):1047~1055.[点击复制]
YANG Yuan-yuan,HU Rong,QIAN Bin,ZHANG Chang-sheng,JIN Huai-ping.Deep reinforcement learning algorithm for dynamic flow shop real-time scheduling problem[J].Control Theory and Technology,2024,41(6):1047~1055.[点击复制]
深度强化学习算法求解动态流水车间实时调度问题
Deep reinforcement learning algorithm for dynamic flow shop real-time scheduling problem
摘要点击 756  全文点击 170  投稿时间:2022-10-19  修订日期:2023-05-15
查看全文  查看/发表评论  下载PDF阅读器
DOI编号  DOI: 10.7641/CTA.2023.20916
  2024,41(6):1047-1055
中文关键词  流水车间调度  新工件到达  深度强化学习  动态实时调度  智能调度
英文关键词  flow shop scheduling  arrival of new jobs  deep reinforcement learning  dynamic real-time scheduling  intelligent scheduling
基金项目  国家自然科学基金项目(62173169, 61963022), 云南省基础研究重点项目(202201AS070030)资助.
作者单位E-mail
杨媛媛 昆明理工大学 信息工程与自动化学院 yangyuanyuan0730@163.com 
胡 蓉* 昆明理工大学 信息工程与自动化学院 ronghu@vip.163.com 
钱 斌 昆明理工大学 信息工程与自动化学院  
张长胜 昆明理工大学 信息工程与自动化学院  
金怀平 昆明理工大学 信息工程与自动化学院  
中文摘要
      本文针对动态流水车间调度问题(DFSP), 以最小化最大完工时间为优化目标, 提出一种自适应深度强化学习算法(ADRLA)进行求解. 首先, 将DFSP的新工件动态到达过程模拟为泊松过程, 进而采用马尔科夫决策过程(MDP)对DFSP的求解过程进行描述, 将DFSP转化为可由强化学习求解的序贯决策问题. 然后, 根据DFSP的排序模型特点, 设计具有较好状态特征区分度和泛化性的状态特征向量, 并依此提出5种特定动作(即调度规则)来选择当前需加工的工件, 同时构造基于问题特性的奖励函数以获取动作执行效果的评价值(即奖励值), 从而确定ADRLA的3类基本要素. 进而, 以深度双Q网络(DDQN) 作为ADRLA中的智能体, 用于进行调度决策. 该智能体采用由少量小规模DFSP确定的数据集(即3类基本要素在不同问题上的数据)训练后, 可较准确刻画不同规模DFSP的状态特征向量与Q值向量(由各动作的Q值组成)间的非线性关系, 从而能对各种规模DFSP进行自适应实时调度. 最后, 通过在不同测试问题上的仿真实验和与算法比较, 验证了所提ADRLA求解DFSP的有效性和实时性.
英文摘要
      This paper aims at the dynamic flow shop scheduling problem (DFSP), an adaptive deep reinforcement learning algorithm (ADRLA) is proposed to minimize the maximum completion time of DFSP. Firstly, the solving process of DFSP is described by the Markov decision process (MDP), so as to transform the DFSP into a sequential decision problem that can be solved by reinforcement learning. Then, according to the characteristics of DFSP scheduling model, the state representation vector with good state feature discrimination and generalization is designed, and five specific actions are proposed (i.e. reward value). Furthermore, the deep double Q network (DDQN) is used as the agent in ADRLA to make scheduling decisions. After training with the data set determined by a small number of small-scale DFSPs (i.e. the data of three basic elements on different problems), the agent can accurately describe the nonlinear relationship between the state representation vector and the Q-value vector (composed of the Q-value of each action) of different scale DFSPs, so as to carry out adaptive real-time scheduling for various scale DFSPs. Finally, simulation experiments on different test problems and comparison with the algorithm verify the effectiveness and real-time performance of the proposed ADRLA in solving DFSP.