基于概率模型的动态分层强化学习

戴朝晖; 袁姣红; 吴敏; 陈鑫

引用本文:	戴朝晖,袁姣红,吴敏,陈鑫.基于概率模型的动态分层强化学习[J].控制理论与应用,2011,28(11):1595~1600.[点击复制]
	DAI Zhao-hui,YUAN Jiao-hong,WU Min,CHEN Xin.Dynamic hierarchical reinforcement learning based on probability model[J].Control Theory and Technology,2011,28(11):1595~1600.[点击复制]

基于概率模型的动态分层强化学习

Dynamic hierarchical reinforcement learning based on probability model

摘要点击 2159 全文点击 1391 投稿时间：2010-05-09 修订日期：2011-01-21

查看全文查看/发表评论下载PDF阅读器

DOI编号 10.7641/j.issn.1000-8152.2011.11.CCTA100535

2011,28(11):1595-1600

中文关键词动态分层强化学习贝叶斯学习状态转移概率模型智能体

英文关键词 dynamic hierarchical reinforcement-learning Bayesian learning state-transition probability model agent

基金项目国家自然科学基金资助项目(60874042); 中国博士后科学基金一等资助项目(20080440177); 中国博士后科学基金特别资助项目(200902483); 教育部高等学校博士点基金新教师基金资助项目(20090162120068).

作者	单位	E-mail
戴朝晖	中南大学信息科学与工程学院	infob@mail.csu.edu.cn
袁姣红	中南大学信息科学与工程学院	yuanjiaohong@wuhua.csu.edu.cn
吴敏^*	中南大学信息科学与工程学院	min@mail.csu.edu.cn
陈鑫	中南大学信息科学与工程学院

中文摘要

为解决大规模强化学习中的“维度灾难”问题, 克服以往学习算法的性能高度依赖于先验知识的局限性, 本文提出一种基于概率模型的动态分层强化学习方法. 首先基于贝叶斯学习对状态转移概率进行建模, 建立基于概率参数的关键状态识别方法, 进而通过聚类动态生成若干状态子空间和学习分层结构下的最优策略. 仿真结果表明该算法能显著提高复杂环境下智能体的学习效率, 适用于未知环境中的大规模学习.

英文摘要

To deal with the overwhelming dimensionality in the large-scale reinforcement-learning and the strong dependence on prior knowledge in existing learning algorithms, we propose the method of dynamic hierarchical reinforcement learning based on the probability model (DHRL--model). This method identifies some key states automatically based on probability parameters of the state-transition probability model established based on Bayesian learning, then generates some state-subspaces dynamically by clustering, and learns the optimal policy based on hierarchical structure. Simulation results show that DHRL--model algorithm improves the learning efficiency of the agent remarkably in the complex environment, and can be applied to learning in the unknown large-scale world.