基于强化学习的快速探索随机树特殊环境中路径重规划算法-Rapidly-exploring random tree algorithm for path re-planning based on reinforcement learning under the peculiar environment

引用本文:	邹启杰,刘世慧,张跃,侯英鹂.基于强化学习的快速探索随机树特殊环境中路径重规划算法[J].控制理论与应用,2020,37(8):1737~1748.[点击复制]
	ZOU Qi-jie,LIU Shi-hui,ZHANG Yue,HOU Ying-li.Rapidly-exploring random tree algorithm for path re-planning based on reinforcement learning under the peculiar environment[J].Control Theory and Technology,2020,37(8):1737~1748.[点击复制]

基于强化学习的快速探索随机树特殊环境中路径重规划算法

Rapidly-exploring random tree algorithm for path re-planning based on reinforcement learning under the peculiar environment

摘要点击 3363 全文点击 964 投稿时间：2019-07-26 修订日期：2020-03-28

查看全文查看/发表评论下载PDF阅读器

DOI编号 10.7641/CTA.2020.90622

2020,37(8):1737-1748

中文关键词快速探索随机树 Sarsa(λ) 局部路径重规划移动机器人特殊环境

英文关键词 rapidly-exploring random tree (RRT) Sarsa(λ) local path re-planning mobile robots peculiar environment

基金项目国家自然科学基金面上项目(61673084), 辽宁省自然基金项目(2019–ZD–0578)资助.

作者	单位	E-mail
邹启杰^*	大连大学	jessie_zou_zou@163.com
刘世慧	大连大学	1473245950@qq.com
张跃	大连大学
侯英鹂	大连大学

中文摘要

针对移动机器人在未知的特殊环境(如U型、狭窄且不规则通道等)下路径规划效率低问题, 本文提出一种强化学习(RL)驱动快速探索随机树(RRT)的局部路径重规划方法(RL–RRT). 该方法利用Sarsa()优化RRT的随机树扩展过程, 既保持未知环境中RRT的随机探索性, 又利用Sarsa()缩减无效区域的探索代价. 具体来说, 在满足移动机器人运动学模型约束的同时, 通过设定扩展节点的回报函数、目标距离函数和平滑度目标函数, 缩减无效节点, 加速探索过程, 从而达到路径规划多目标决策优化的目标. 仿真实验中, 将本方法用于多种未知的特殊环境, 实验结果显示出RL–RRT算法的可行性、有效性及其性能优势.

英文摘要

In this paper, a local path re-planning rapidly-exploring random tree (RRT) method (RL–RRT) driven by reinforcement learning (RL) is proposed, aiming at the low efficiency of path planning for the mobile robot in the unknown and peculiar environments such as U-shaped, narrow and irregular channels. The RRT random tree expansion process is optimized by Sarsa() in this method, which not only maintains the random exploratory nature of RRT in the unknown environment, but also uses Sarsa() to reduce the exploration cost of the invalid region. Specifically, RL–RRT can reduce invalid nodes and accelerate the exploration process by setting the return function, target distance function and smoothness objective function of extended nodes, while satisfying the constraints of mobile robot kinematics model, so as to achieve the goal of multi-objective decision-making optimization of path planning. In the simulation experiment, RL–RRT is applied to many unknown and particular environments. The experimental results show the feasibility, effectiveness and performance advantages of RL–RRT method.