非泊松工件流CSPS系统的Q学习算法适用性仿真研究

苏娜; 唐昊; 戴飞; 王彬; 周雷

引用本文:	苏娜,唐昊,戴飞,王彬,周雷.非泊松工件流CSPS系统的Q学习算法适用性仿真研究[J].控制理论与应用,2020,37(12):2591~2600.[点击复制]
	SU Na,TANG Hao,DAI Fei,WANG Bin,ZHOU Lei.Simulation research of the applicability of Q-learning algorithm in CSPS systems with non-Poisson part flow[J].Control Theory and Technology,2020,37(12):2591~2600.[点击复制]

非泊松工件流CSPS系统的Q学习算法适用性仿真研究

Simulation research of the applicability of Q-learning algorithm in CSPS systems with non-Poisson part flow

摘要点击 2214 全文点击 656 投稿时间：2018-10-11 修订日期：2020-07-15

查看全文查看/发表评论下载PDF阅读器

DOI编号 10.7641/CTA.2020.80782

2020,37(12):2591-2600

中文关键词传送带给料加工站马尔可夫调制泊松过程半马尔可夫调制泊松过程 Q学习

英文关键词 conveyor-serviced production station Markov modulation Poisson process semi-Markovian modulation Poisson process Q-learning algorithm

基金项目国家自然科学基金项目(61573126), 国家重点研发计划项目(2017YFGH002010), 中央高校基本科研业务费项目(JZ2016YYPY0052)资助.

作者	单位	E-mail
苏娜	合肥工业大学电气与自动化工程学院	suna_nasu@mail.hfut.edu.cn
唐昊^*	合肥工业大学电气与自动化工程学院	htang@hfut.edu.cn
戴飞	合肥工业大学电气与自动化工程学院
王彬	合肥工业大学电气与自动化工程学院
周雷	合肥工业大学计算机与信息学院

中文摘要

研究工件非泊松到达情况下, 传送带给料加工站(CSPS)系统无法建立成半马尔可夫决策过程(SMDP)模型时, Q学习算法的适用性问题. 首先, 以马尔可夫调制泊松过程(MMPP)和半马尔可夫调制泊松过程(SMMPP)来模拟非泊松工件流, 并在相同的平均到达率下, 仿真评估其Q学习算法性能, 并分别与泊松工件流情况下的Q学习算法性能进行比较: 其次, 在非泊松工件流情况下, 观测以实时统计平均到达率作为工件标准泊松到达率的理论优化情况: 最后讨论在MMPP和SMMPP叠加混合非泊松工件流情况下CSPS 系统的Q学习算法性能. 实验表明, 在工件非泊松到达情况下Q学习算法依然能学到较好的控制策略, 从而说明了CSPS系统中Q学习算法的适用性.

英文摘要

This paper is mainly concerned the applicability of Q-learning algorithm when the parts arrive in accordance with the non-Poisson process and conveyor-serviced production station (CSPS) system cannot be established as a semi- Markov decision process (SMDP) model. Firstly, Markov modulation Poisson process (MMPP) and semi-Markovian modulation Poisson process (SMMPP) are used as the representative of the non-Poisson distribution arrival. And under the same average arrival rate, the performances are simulated by Q-learning algorithm and compared with the performance of the Q-learning algorithm under the Poisson parts flow. Secondly, in the case of non-Poisson parts flow, the observation is based on the real-time statistical average arrival rate as the theoretical optimization of the standard Poisson arrival rate of the parts. Finally, the performance of Q-learning algorithm for CSPS system in the case of MMPP and SMMPP mixed non- Poisson parts flow is discussed. Simulation results show that Q-learning algorithm can still study a good control strategy when the parts are non-Poisson, which shows the applicability of Q-learning algorithm in CSPS system.