引用本文:吴培良,张彦,毛秉毅,陈雯柏,高国伟.面向稀疏奖励的机器人操作技能学习[J].控制理论与应用,2024,41(1):99~108.[点击复制]
WU Pei-liang,ZHANG Yan,MAO Bing-yi,CHEN Wen-bai,GAO Guo-wei.Robot manipulation skills learning for sparse rewards[J].Control Theory and Technology,2024,41(1):99~108.[点击复制]
面向稀疏奖励的机器人操作技能学习
Robot manipulation skills learning for sparse rewards
摘要点击 1250  全文点击 1708  投稿时间:2022-02-18  修订日期:2023-10-04
查看全文  查看/发表评论  下载PDF阅读器
DOI编号  10.7641/CTA.2022.20121
  2024,41(1):99-108
中文关键词  机器人操作技能学习  强化学习  稀疏奖励  最大熵方法  自适应温度参数  元学习
英文关键词  robot manipulation skills learning  reinforcement learning  sparse reward  maximum entropy methods  adaptive temperature parameters  meta-learning
基金项目  国家重点研发计划项目(2018YFB1308300), 国家自然科学基金区域联合基金项目(U20A20167), 北京市自然科学基金项目(4202026), 河北省自 然科学基金项目(F202103079)资助.
作者单位E-mail
吴培良* 燕山大学 peiliangwu@gmail.com 
张彦 燕山大学  
毛秉毅 燕山大学  
陈雯柏 北京信息科技大学  
高国伟 北京信息科技大学  
中文摘要
      基于深度强化学习的机器人操作技能学习成为研究热点, 但由于任务的稀疏奖励性质, 学习效率较低. 本 文提出了基于元学习的双经验池自适应软更新事后经验回放方法, 并将其应用于稀疏奖励的机器人操作技能学习 问题求解. 首先, 在软更新事后经验回放算法的基础上推导出可以提高算法效率的精简值函数, 并加入温度自适应 调整策略, 动态调整温度参数以适应不同的任务环境; 其次, 结合元学习思想对经验回放进行分割, 训练时动态调整 选取真实采样数据和构建虚拟数的比例, 提出了DAS-HER方法; 然后, 将DAS-HER算法应用到机器人操作技能学 习中, 构建了一个稀疏奖励环境下具有通用性的机器人操作技能学习框架; 最后, 在Mujoco下的Fetch和Hand环境 中, 进行了8项任务的对比实验, 实验结果表明, 无论是在训练效率还是在成功率方面, 本文算法表现均优于其他算 法.
英文摘要
      Robot manipulation skills learning based on deep reinforcement learning has become a research hotspot. However, due to the sparse reward nature of robot manipulation skills learning, the learning efficiency is low. In this paper, a double experience replay buffer adaptive soft hindsight experience replay (DAS-HER) algorithm based on metalearning is proposed, and applied to solve the manipulation skills learning problem with sparse reward. Firstly, based on the soft hindsight experience replay (SHER) algorithm, a simplified value function which can improve the efficiency of the algorithm is derived, and a temperature adaptive adjustment strategy is introduced which can dynamically adjust the temperature parameters to adapt to different task environments. Secondly, combined with meta-learning, the experience replay is segmented, dynamically adjust the ratio of real sampling data and construct virtual data during training, and the DAS-HER algorithm is proposed. Thirdly, a generalized framework for robot manipulation skills learning under a sparse reward environment is constructed, and DAS-HER algorithm is applied to robot manipulation skills learning. Finally, comparative experiments for eight tasks are conducted both in Fetch and Hand environments under Mujoco environment, and the results show that the proposed algorithms outperform other algorithms in terms of training efficiency and success rate.