引用本文: | 吴培良,张彦,毛秉毅,陈雯柏,高国伟.面向稀疏奖励的机器人操作技能学习[J].控制理论与应用,2024,41(1):99~108.[点击复制] |
WU Pei-liang,ZHANG Yan,MAO Bing-yi,CHEN Wen-bai,GAO Guo-wei.Robot manipulation skills learning for sparse rewards[J].Control Theory and Technology,2024,41(1):99~108.[点击复制] |
|
面向稀疏奖励的机器人操作技能学习 |
Robot manipulation skills learning for sparse rewards |
摘要点击 1248 全文点击 1708 投稿时间:2022-02-18 修订日期:2023-10-04 |
查看全文 查看/发表评论 下载PDF阅读器 |
DOI编号 10.7641/CTA.2022.20121 |
2024,41(1):99-108 |
中文关键词 机器人操作技能学习 强化学习 稀疏奖励 最大熵方法 自适应温度参数 元学习 |
英文关键词 robot manipulation skills learning reinforcement learning sparse reward maximum entropy methods adaptive temperature parameters meta-learning |
基金项目 国家重点研发计划项目(2018YFB1308300), 国家自然科学基金区域联合基金项目(U20A20167), 北京市自然科学基金项目(4202026), 河北省自 然科学基金项目(F202103079)资助. |
|
中文摘要 |
基于深度强化学习的机器人操作技能学习成为研究热点, 但由于任务的稀疏奖励性质, 学习效率较低. 本
文提出了基于元学习的双经验池自适应软更新事后经验回放方法, 并将其应用于稀疏奖励的机器人操作技能学习
问题求解. 首先, 在软更新事后经验回放算法的基础上推导出可以提高算法效率的精简值函数, 并加入温度自适应
调整策略, 动态调整温度参数以适应不同的任务环境; 其次, 结合元学习思想对经验回放进行分割, 训练时动态调整
选取真实采样数据和构建虚拟数的比例, 提出了DAS-HER方法; 然后, 将DAS-HER算法应用到机器人操作技能学
习中, 构建了一个稀疏奖励环境下具有通用性的机器人操作技能学习框架; 最后, 在Mujoco下的Fetch和Hand环境
中, 进行了8项任务的对比实验, 实验结果表明, 无论是在训练效率还是在成功率方面, 本文算法表现均优于其他算
法. |
英文摘要 |
Robot manipulation skills learning based on deep reinforcement learning has become a research hotspot.
However, due to the sparse reward nature of robot manipulation skills learning, the learning efficiency is low. In this
paper, a double experience replay buffer adaptive soft hindsight experience replay (DAS-HER) algorithm based on metalearning
is proposed, and applied to solve the manipulation skills learning problem with sparse reward. Firstly, based on
the soft hindsight experience replay (SHER) algorithm, a simplified value function which can improve the efficiency of
the algorithm is derived, and a temperature adaptive adjustment strategy is introduced which can dynamically adjust the
temperature parameters to adapt to different task environments. Secondly, combined with meta-learning, the experience
replay is segmented, dynamically adjust the ratio of real sampling data and construct virtual data during training, and the
DAS-HER algorithm is proposed. Thirdly, a generalized framework for robot manipulation skills learning under a sparse
reward environment is constructed, and DAS-HER algorithm is applied to robot manipulation skills learning. Finally,
comparative experiments for eight tasks are conducted both in Fetch and Hand environments under Mujoco environment,
and the results show that the proposed algorithms outperform other algorithms in terms of training efficiency and success
rate. |
|
|
|
|
|