Application of cerebellar model articulation controller network to learning optimization control in conveyor-serviced production station
DOI编号  10.7641/j.issn.1000-8152.2011.11.CCTA100836
中文关键词  传送带给料生产加工站  小脑模型关节控制器  Q学习  在线策略迭代
英文关键词  conveyor-serviced production station  cerebellar model articulation controller  Q-learning  online policy iteration
基金项目  国家自然科学基金资助项目(60873003, 61174186); 教育部留学回国人员科研启动基金资助项目(教外司留2008890); 安徽省自然科学基金资助项目(090412046); 安徽高校省级自然科学研究重点资助项目(KJ2008A058, KJ2011A230); 中日国际科技合作资助项目(2011FA10440).
周雷* 合肥工业大学 计算机与信息学院 zhouleizhl@163.com 
孔凤 合肥工业大学 计算机与信息学院  
唐昊 合肥工业大学 计算机与信息学院
张建军 合肥工业大学 计算机与信息学院
      研究单站点传送带给料生产加工站(conveyor-serviced production station, CSPS)系统的前视(look-ahead)距离最优控制问题, 以提高系统的工作效率. 论文运用半Markov决策过程对CSPS 优化控制问题进行建模. 考虑传统Q学习难以直接处理CSPS系统前视距离为连续变量的优化控制问题, 将小脑模型关节控制器网络的Q值函数逼近与在线学习技术相结合, 给出了在线Q学习及模型无关的在线策略迭代算法. 仿真结果表明, 文中算法提高了学习速度和优化精度.
      This paper is concerned with the optimization of the look-ahead distance for a conveyor-serviced production station(CSPS) to improve the efficiency of operations. The optimal control process for CSPS is modeled by a semi-Markov decision process(SMDP). Since the standard Q-learning is difficult to deal with the continuous variable optimal look-ahead control problem of CSPS directly, Cerebellar Model Articulation Controller(CMAC) for Q-values function approximation is combined with the online learning technology, and some online Q-learning and model-free online policy iteration algorithms are provided. Simulation results show that the proposed algorithms improve the learning speed and the precision of optimization.