面向稀疏奖励的机器人操作技能学习

吴培良; 张彦; 毛秉毅; 陈雯柏; 高国伟

引用本文:	吴培良,张彦,毛秉毅,陈雯柏,高国伟.面向稀疏奖励的机器人操作技能学习[J].控制理论与应用,2024,41(1):99~108.[点击复制]
	WU Pei-liang,ZHANG Yan,MAO Bing-yi,CHEN Wen-bai,GAO Guo-wei.Robot manipulation skills learning for sparse rewards[J].Control Theory and Technology,2024,41(1):99~108.[点击复制]

面向稀疏奖励的机器人操作技能学习

Robot manipulation skills learning for sparse rewards

摘要点击 1250 全文点击 1708 投稿时间：2022-02-18 修订日期：2023-10-04

查看全文查看/发表评论下载PDF阅读器

DOI编号 10.7641/CTA.2022.20121

2024,41(1):99-108

中文关键词机器人操作技能学习强化学习稀疏奖励最大熵方法自适应温度参数元学习

英文关键词 robot manipulation skills learning reinforcement learning sparse reward maximum entropy methods adaptive temperature parameters meta-learning

基金项目国家重点研发计划项目(2018YFB1308300), 国家自然科学基金区域联合基金项目(U20A20167), 北京市自然科学基金项目(4202026), 河北省自然科学基金项目(F202103079)资助.

作者	单位	E-mail
吴培良^*	燕山大学	peiliangwu@gmail.com
张彦	燕山大学
毛秉毅	燕山大学
陈雯柏	北京信息科技大学
高国伟	北京信息科技大学

中文摘要

基于深度强化学习的机器人操作技能学习成为研究热点, 但由于任务的稀疏奖励性质, 学习效率较低. 本文提出了基于元学习的双经验池自适应软更新事后经验回放方法, 并将其应用于稀疏奖励的机器人操作技能学习问题求解. 首先, 在软更新事后经验回放算法的基础上推导出可以提高算法效率的精简值函数, 并加入温度自适应调整策略, 动态调整温度参数以适应不同的任务环境; 其次, 结合元学习思想对经验回放进行分割, 训练时动态调整选取真实采样数据和构建虚拟数的比例, 提出了DAS-HER方法; 然后, 将DAS-HER算法应用到机器人操作技能学习中, 构建了一个稀疏奖励环境下具有通用性的机器人操作技能学习框架; 最后, 在Mujoco下的Fetch和Hand环境中, 进行了8项任务的对比实验, 实验结果表明, 无论是在训练效率还是在成功率方面, 本文算法表现均优于其他算法.

英文摘要

Robot manipulation skills learning based on deep reinforcement learning has become a research hotspot. However, due to the sparse reward nature of robot manipulation skills learning, the learning efficiency is low. In this paper, a double experience replay buffer adaptive soft hindsight experience replay (DAS-HER) algorithm based on metalearning is proposed, and applied to solve the manipulation skills learning problem with sparse reward. Firstly, based on the soft hindsight experience replay (SHER) algorithm, a simplified value function which can improve the efficiency of the algorithm is derived, and a temperature adaptive adjustment strategy is introduced which can dynamically adjust the temperature parameters to adapt to different task environments. Secondly, combined with meta-learning, the experience replay is segmented, dynamically adjust the ratio of real sampling data and construct virtual data during training, and the DAS-HER algorithm is proposed. Thirdly, a generalized framework for robot manipulation skills learning under a sparse reward environment is constructed, and DAS-HER algorithm is applied to robot manipulation skills learning. Finally, comparative experiments for eight tasks are conducted both in Fetch and Hand environments under Mujoco environment, and the results show that the proposed algorithms outperform other algorithms in terms of training efficiency and success rate.