引用本文: | 张昌昕,张兴龙,徐昕,陆阳.安全强化学习及其在机器人系统中的应用综述[J].控制理论与应用,2023,40(12):2090~2103.[点击复制] |
ZHANG Chang-xin,ZHANG Xing-long,XU Xin,LU Yang.Safe reinforcement learning and its applications in robotics: A survey[J].Control Theory and Technology,2023,40(12):2090~2103.[点击复制] |
|
安全强化学习及其在机器人系统中的应用综述 |
Safe reinforcement learning and its applications in robotics: A survey |
摘要点击 1448 全文点击 444 投稿时间:2023-04-20 修订日期:2023-12-04 |
查看全文 查看/发表评论 下载PDF阅读器 |
DOI编号 10.7641/CTA.2023.30247 |
2023,40(12):2090-2103 |
中文关键词 机器人 安全强化学习 约束马尔可夫决策过程 鲁棒性 |
英文关键词 robotics safe reinforcement learning constrained Markov decision process robustness |
基金项目 国家自然科学基金项目(62003361, U21A20518)资助. |
|
中文摘要 |
强化学习是一类通过与环境交互实现序贯优化决策的机器学习方法, 已经在游戏、推荐系统及自然语言处
理等任务中得到了应用. 然而, 强化学习算法应用于真实世界中的机器人系统时, 如何保证安全性仍然面临挑战. 近
年来, 针对机器人系统的安全强化学习方法研究已经成为热点方向, 获得了机器人和强化学习领域的广泛关注. 本
文结合现有的工作, 综述了安全强化学习理论和方法的重要成果和发展趋势, 并重点关注了现有方法在机器人领域
的适用性. 本文首先给出了安全强化学习的一般问题描述. 其次, 从方法和性能的角度重点介绍了该领域的最新重
要进展, 包括约束策略优化、控制障碍函数、安全过滤器和对抗性博弈训练等方法, 以及安全强化学习方法在地面
移动机器人系统、无人飞行器和其他机器人系统中的应用情况. 最后, 对该领域的未来研究方向进行了展望和探讨. |
英文摘要 |
Reinforcement learning is a kind of machine learning method that realizes sequential optimization decisions
by interacting with the environment. It has been applied in games, recommendation systems and natural language processing. However, it is still a challenge to ensure the safety of reinforcement learning algorithms when applied to robotics in
the real world. In recent years, the safe reinforcement learning methods for robotics systems have become a hot research
direction, gaining extensive attention in robotics and reinforcement learning communities. This paper surveys important
achievements and development tendency of safe reinforcement learning based on the existing work and focuses on their
applicability in robotics. This paper first introduces the general problem description of safe reinforcement learning. Then
we focus on the latest significant progress in this field from the perspective of method and performance, including constraint policy optimization, control barrier function, safety filter and adversarial training methods, and their applications in
autonomous driving vehicles, unmanned aerial vehicles and other robotic systems. Finally, the future research direction of
this field is prospected and discussed. |
|
|
|
|
|