引用本文:刘智斌,曾晓勤,徐彦,禹继国.采用资格迹的神经网络学习控制算法[J].控制理论与应用,2015,32(7):887~894.[点击复制]
LIU Zhi-bin,ZENG Xiao-qin,XU Yan,YU Ji-guo.Learning to control by neural networks using eligibility traces[J].Control Theory and Technology,2015,32(7):887~894.[点击复制]
采用资格迹的神经网络学习控制算法
Learning to control by neural networks using eligibility traces
摘要点击 4802  全文点击 1569  投稿时间:2014-04-27  修订日期:2015-04-10
查看全文  查看/发表评论  下载PDF阅读器
DOI编号  10.7641/CTA.2015.40367
  2015,32(7):887-894
中文关键词  强化学习  神经网络  资格迹  倒立摆  梯度下降
英文关键词  reinforcement learning  neural networks  eligibility traces  cart-pole system  gradient descent
基金项目  国家自然科学基金项目(61403205, 61373027, 60117089), 曲阜师范大学实验室开放基金项目(sk201415)资助.
作者单位E-mail
刘智斌* 曲阜师范大学 信息科学与工程学院 lzbxian@163.com 
曾晓勤 河海大学 计算机与信息学院  
徐彦 南京农业大学 信息科技学院  
禹继国 曲阜师范大学 信息科学与工程学院  
中文摘要
      强化学习是解决自适应问题的重要方法, 被广泛地应用于连续状态下的学习控制, 然而存在效率不高和收敛速度较慢的问题. 在运用反向传播(back propagation, BP)神经网络基础上, 结合资格迹方法提出一种算法, 实现了强化学习过程的多步更新. 解决了输出层的局部梯度向隐层节点的反向传播问题, 从而实现了神经网络隐层权值的快速更新, 并提供一个算法描述. 提出了一种改进的残差法, 在神经网络的训练过程中将各层权值进行线性优化加权, 既获得了梯度下降法的学习速度又获得了残差梯度法的收敛性能, 将其应用于神经网络隐层的权值更新, 改善了值函数的收敛性能. 通过一个倒立摆平衡系统仿真实验, 对算法进行了验证和分析. 结果显示, 经过较短时间的学习, 本方法能成功地控制倒立摆, 显著提高了学习效率.
英文摘要
      Reinforcement learning is an important approach to solve the adaptive learning control problems in continuous state space. However, it is bedeviled by its low learning efficiency and low convergence rate. In order to eliminate those deficiencies, based on back propagation (BP) neural networks and eligibility traces, we propose a learning algorithm with a complete description to achieve the multi-step updates in the process of reinforced learning to realize the counter propagation of the local gradient from output layer nodes to hidden layer nodes; thus, rapidly adjusting the weights of hidden layers. During the training processes of neural networks, a modified residual method is employed to optimize the weights in each layer by linear combination, achieving the rapid learning rate of the direct gradient method as well as the desired convergence properties of the residual gradient method. Applying this method to update the weights of hidden layers in a neural network, we improve the convergence properties of value functions. A cart-pole system is adopted for testing the application results of the above mentioned algorithms. Simulation results show that all our algorithms can successfully achieve the control for the cart-pole balancing system and improve the learning efficiency significantly.