引用本文: | 朱晓庆,刘鑫源,阮晓钢,张思远,李春阳,李鹏.融合元学习和PPO算法的四足机器人运动技能学习方法[J].控制理论与应用,2024,41(1):155~162.[点击复制] |
ZHU Xiao-qing,LIU Xin-yuan,RUAN Xiao-gang,ZHANG Si-yuan,LI Chun-yang,LI Peng.A quadruped robot kinematic skill learning method integrating meta-learning and PPO algorithms[J].Control Theory and Technology,2024,41(1):155~162.[点击复制] |
|
融合元学习和PPO算法的四足机器人运动技能学习方法 |
A quadruped robot kinematic skill learning method integrating meta-learning and PPO algorithms |
摘要点击 1328 全文点击 1746 投稿时间:2022-09-27 修订日期:2023-04-07 |
查看全文 查看/发表评论 下载PDF阅读器 |
DOI编号 10.7641/CTA.2023.20847 |
2024,41(1):155-162 |
中文关键词 四足机器人 步态学习 强化学习 元学习 |
英文关键词 quadruped robot gait learning reinforcement learning meta-learning |
基金项目 国家自然科学基金项目(62103009), 北京市自然科学基金项目(4202005)资助. |
|
中文摘要 |
具备学习能力是高等动物智能的典型表现特征, 为探明四足动物运动技能学习机理, 本文对四足机器人步
态学习任务进行研究, 复现了四足动物的节律步态学习过程. 近年来, 近端策略优化(PPO)算法作为深度强化学习
的典型代表, 普遍被用于四足机器人步态学习任务, 实验效果较好且仅需较少的超参数. 然而, 在多维输入输出场
景下, 其容易收敛到局部最优点, 表现为四足机器人学习到步态节律信号杂乱且重心震荡严重. 为解决上述问题,
在元学习启发下, 基于元学习具有刻画学习过程高维抽象表征优势, 本文提出了一种融合元学习和PPO思想的元近
端策略优化(MPPO)算法, 该算法可以让四足机器人进化学习到更优步态. 在PyBullet仿真平台上的仿真实验结果表
明, 本文提出的算法可以使四足机器人学会行走运动技能, 且与柔性行动者评价器(SAC)和PPO算法的对比实验显
示, 本文提出的MPPO算法具有步态节律信号更规律、行走速度更快等优势. |
英文摘要 |
Learning ability is a typical characteristic of higher animal intelligence. In order to explore the learning
mechanism of quadruped motor skills, this paper studies the gait learning task of quadruped robots, and reproduces the
rhythmic gait learning process of quadruped animals from scratch. In recent years, proximal policy optimization (PPO)
algorithm, as a typical representative algorithm of deep reinforcement learning, has been widely used in gait learning
tasks for quadruped robots, with good experimental results and fewer hyperparameters required. However, in the multidimensional
input and output scenario, it is easy to converge to the local optimum point, in the experimental environment
of this study, the gait rhythm signals of the trained quadruped robot were irregular, and the center of gravity oscillates.
To solve the above problems, inspired by meta-learning, based on the advantage of meta-learning in characterizing the
high-dimensional abstract representation of learning processes, this paper proposes an meta proximal policy optimization
(MPPO) algorithm that combines meta-learning and PPO algorithms. This algorithm can enable quadruped robots to learn
better gait. The simulation results on the PyBullet simulation platform show that the algorithm proposed in this paper can
enable quadruped robots to learn walking skills. Compared with soft actor-critic (SAC) and PPO algorithms, the MPPO
algorithm proposed in this paper has advantages such as more regular gait rhythm signals and faster walking speed. |
|
|
|
|
|