安全强化学习及其在机器人系统中的应用综述

张昌昕; 张兴龙; 徐昕; 陆阳

引用本文:	张昌昕,张兴龙,徐昕,陆阳.安全强化学习及其在机器人系统中的应用综述[J].控制理论与应用,2023,40(12):2090~2103.[点击复制]
	ZHANG Chang-xin,ZHANG Xing-long,XU Xin,LU Yang.Safe reinforcement learning and its applications in robotics: A survey[J].Control Theory and Technology,2023,40(12):2090~2103.[点击复制]

安全强化学习及其在机器人系统中的应用综述

Safe reinforcement learning and its applications in robotics: A survey

摘要点击 1448 全文点击 444 投稿时间：2023-04-20 修订日期：2023-12-04

查看全文查看/发表评论下载PDF阅读器

DOI编号 10.7641/CTA.2023.30247

2023,40(12):2090-2103

中文关键词机器人安全强化学习约束马尔可夫决策过程鲁棒性

英文关键词 robotics safe reinforcement learning constrained Markov decision process robustness

基金项目国家自然科学基金项目(62003361, U21A20518)资助.

作者	单位	E-mail
张昌昕	国防科技大学	changxzhang@163.com
张兴龙	国防科技大学
徐昕^*	国防科技大学	xuxin_mail@263.net
陆阳	国防科技大学

中文摘要

强化学习是一类通过与环境交互实现序贯优化决策的机器学习方法, 已经在游戏、推荐系统及自然语言处理等任务中得到了应用. 然而, 强化学习算法应用于真实世界中的机器人系统时, 如何保证安全性仍然面临挑战. 近年来, 针对机器人系统的安全强化学习方法研究已经成为热点方向, 获得了机器人和强化学习领域的广泛关注. 本文结合现有的工作, 综述了安全强化学习理论和方法的重要成果和发展趋势, 并重点关注了现有方法在机器人领域的适用性. 本文首先给出了安全强化学习的一般问题描述. 其次, 从方法和性能的角度重点介绍了该领域的最新重要进展, 包括约束策略优化、控制障碍函数、安全过滤器和对抗性博弈训练等方法, 以及安全强化学习方法在地面移动机器人系统、无人飞行器和其他机器人系统中的应用情况. 最后, 对该领域的未来研究方向进行了展望和探讨.

英文摘要

Reinforcement learning is a kind of machine learning method that realizes sequential optimization decisions by interacting with the environment. It has been applied in games, recommendation systems and natural language processing. However, it is still a challenge to ensure the safety of reinforcement learning algorithms when applied to robotics in the real world. In recent years, the safe reinforcement learning methods for robotics systems have become a hot research direction, gaining extensive attention in robotics and reinforcement learning communities. This paper surveys important achievements and development tendency of safe reinforcement learning based on the existing work and focuses on their applicability in robotics. This paper first introduces the general problem description of safe reinforcement learning. Then we focus on the latest significant progress in this field from the perspective of method and performance, including constraint policy optimization, control barrier function, safety filter and adversarial training methods, and their applications in autonomous driving vehicles, unmanned aerial vehicles and other robotic systems. Finally, the future research direction of this field is prospected and discussed.