引用本文: | 戴朝晖,袁姣红,吴敏,陈鑫.基于概率模型的动态分层强化学习[J].控制理论与应用,2011,28(11):1595~1600.[点击复制] |
DAI Zhao-hui,YUAN Jiao-hong,WU Min,CHEN Xin.Dynamic hierarchical reinforcement learning based on probability model[J].Control Theory and Technology,2011,28(11):1595~1600.[点击复制] |
|
基于概率模型的动态分层强化学习 |
Dynamic hierarchical reinforcement learning based on probability model |
摘要点击 2159 全文点击 1391 投稿时间:2010-05-09 修订日期:2011-01-21 |
查看全文 查看/发表评论 下载PDF阅读器 |
DOI编号 10.7641/j.issn.1000-8152.2011.11.CCTA100535 |
2011,28(11):1595-1600 |
中文关键词 动态分层强化学习 贝叶斯学习 状态转移概率模型 智能体 |
英文关键词 dynamic hierarchical reinforcement-learning Bayesian learning state-transition probability model agent |
基金项目 国家自然科学基金资助项目(60874042); 中国博士后科学基金一等资助项目(20080440177); 中国博士后科学基金特别资助项目(200902483); 教育部高等学校博士点基金新教师基金资助项目(20090162120068). |
|
中文摘要 |
为解决大规模强化学习中的“维度灾难”问题, 克服以往学习算法的性能高度依赖于先验知识的局限性, 本文提出一种基于概率模型的动态分层强化学习方法. 首先基于贝叶斯学习对状态转移概率进行建模, 建立基于概率参数的关键状态识别方法, 进而通过聚类动态生成若干状态子空间和学习分层结构下的最优策略. 仿真结果表明该算法能显著提高复杂环境下智能体的学习效率, 适用于未知环境中的大规模学习. |
英文摘要 |
To deal with the overwhelming dimensionality in the large-scale reinforcement-learning and the strong dependence on prior knowledge in existing learning algorithms, we propose the method of dynamic hierarchical reinforcement learning based on the probability model (DHRL--model). This method identifies some key states automatically based on probability parameters of the state-transition probability model established based on Bayesian learning, then generates some state-subspaces dynamically by clustering, and learns the optimal policy based on hierarchical structure. Simulation results show that DHRL--model algorithm improves the learning efficiency of the agent remarkably in the complex environment, and can be applied to learning in the unknown large-scale world. |