引用本文:张昌昕,张兴龙,徐昕,陆阳.安全强化学习及其在机器人系统中的应用综述[J].控制理论与应用,2023,40(12):2090~2103.[点击复制]
ZHANG Chang-xin,ZHANG Xing-long,XU Xin,LU Yang.Safe reinforcement learning and its applications in robotics: A survey[J].Control Theory and Technology,2023,40(12):2090~2103.[点击复制]
安全强化学习及其在机器人系统中的应用综述
Safe reinforcement learning and its applications in robotics: A survey
摘要点击 1452  全文点击 444  投稿时间:2023-04-20  修订日期:2023-12-04
查看全文  查看/发表评论  下载PDF阅读器
DOI编号  10.7641/CTA.2023.30247
  2023,40(12):2090-2103
中文关键词  机器人  安全强化学习  约束马尔可夫决策过程  鲁棒性
英文关键词  robotics  safe reinforcement learning  constrained Markov decision process  robustness
基金项目  国家自然科学基金项目(62003361, U21A20518)资助.
作者单位E-mail
张昌昕 国防科技大学 changxzhang@163.com 
张兴龙 国防科技大学  
徐昕* 国防科技大学 xuxin_mail@263.net 
陆阳 国防科技大学  
中文摘要
      强化学习是一类通过与环境交互实现序贯优化决策的机器学习方法, 已经在游戏、推荐系统及自然语言处 理等任务中得到了应用. 然而, 强化学习算法应用于真实世界中的机器人系统时, 如何保证安全性仍然面临挑战. 近 年来, 针对机器人系统的安全强化学习方法研究已经成为热点方向, 获得了机器人和强化学习领域的广泛关注. 本 文结合现有的工作, 综述了安全强化学习理论和方法的重要成果和发展趋势, 并重点关注了现有方法在机器人领域 的适用性. 本文首先给出了安全强化学习的一般问题描述. 其次, 从方法和性能的角度重点介绍了该领域的最新重 要进展, 包括约束策略优化、控制障碍函数、安全过滤器和对抗性博弈训练等方法, 以及安全强化学习方法在地面 移动机器人系统、无人飞行器和其他机器人系统中的应用情况. 最后, 对该领域的未来研究方向进行了展望和探讨.
英文摘要
      Reinforcement learning is a kind of machine learning method that realizes sequential optimization decisions by interacting with the environment. It has been applied in games, recommendation systems and natural language processing. However, it is still a challenge to ensure the safety of reinforcement learning algorithms when applied to robotics in the real world. In recent years, the safe reinforcement learning methods for robotics systems have become a hot research direction, gaining extensive attention in robotics and reinforcement learning communities. This paper surveys important achievements and development tendency of safe reinforcement learning based on the existing work and focuses on their applicability in robotics. This paper first introduces the general problem description of safe reinforcement learning. Then we focus on the latest significant progress in this field from the perspective of method and performance, including constraint policy optimization, control barrier function, safety filter and adversarial training methods, and their applications in autonomous driving vehicles, unmanned aerial vehicles and other robotic systems. Finally, the future research direction of this field is prospected and discussed.