小脑模型关节控制器网络在传送带给料生产加工站学习优化控制中的应用

周雷; 孔凤; 唐昊; 张建军

引用本文:	周雷,孔凤,唐昊,张建军.小脑模型关节控制器网络在传送带给料生产加工站学习优化控制中的应用[J].控制理论与应用,2011,28(11):1665~1670.[点击复制]
	ZHOU Lei,KONG Feng,TANG Hao,ZHANG Jian-jun.Application of cerebellar model articulation controller network to learning optimization control in conveyor-serviced production station[J].Control Theory & Applications,2011,28(11):1665~1670.[点击复制]

小脑模型关节控制器网络在传送带给料生产加工站学习优化控制中的应用

Application of cerebellar model articulation controller network to learning optimization control in conveyor-serviced production station

摘要点击 2491 全文点击 1715 投稿时间：2010-07-20 修订日期：2011-01-18

查看全文查看/发表评论下载PDF阅读器

DOI编号 10.7641/j.issn.1000-8152.2011.11.CCTA100836

2011,28(11):1665-1670

中文关键词传送带给料生产加工站小脑模型关节控制器 Q学习在线策略迭代

英文关键词 conveyor-serviced production station cerebellar model articulation controller Q-learning online policy iteration

基金项目国家自然科学基金资助项目(60873003, 61174186); 教育部留学回国人员科研启动基金资助项目(教外司留2008890); 安徽省自然科学基金资助项目(090412046); 安徽高校省级自然科学研究重点资助项目(KJ2008A058, KJ2011A230); 中日国际科技合作资助项目(2011FA10440).

作者	单位	E-mail
周雷^*	合肥工业大学计算机与信息学院	zhouleizhl@163.com
孔凤	合肥工业大学计算机与信息学院
唐昊	合肥工业大学计算机与信息学院安全关键工业测控技术教育部工程研究中心
张建军	合肥工业大学计算机与信息学院安全关键工业测控技术教育部工程研究中心

中文摘要

研究单站点传送带给料生产加工站(conveyor-serviced production station, CSPS)系统的前视(look-ahead)距离最优控制问题, 以提高系统的工作效率. 论文运用半Markov决策过程对CSPS 优化控制问题进行建模. 考虑传统Q学习难以直接处理CSPS系统前视距离为连续变量的优化控制问题, 将小脑模型关节控制器网络的Q值函数逼近与在线学习技术相结合, 给出了在线Q学习及模型无关的在线策略迭代算法. 仿真结果表明, 文中算法提高了学习速度和优化精度.

英文摘要

This paper is concerned with the optimization of the look-ahead distance for a conveyor-serviced production station(CSPS) to improve the efficiency of operations. The optimal control process for CSPS is modeled by a semi-Markov decision process(SMDP). Since the standard Q-learning is difficult to deal with the continuous variable optimal look-ahead control problem of CSPS directly, Cerebellar Model Articulation Controller(CMAC) for Q-values function approximation is combined with the online learning technology, and some online Q-learning and model-free online policy iteration algorithms are provided. Simulation results show that the proposed algorithms improve the learning speed and the precision of optimization.