引用本文:罗傲,肖文彬,周琪,鲁仁全.基于强化学习的一类具有输入约束非线性系统最优控制[J].控制理论与应用,2022,39(1):154~164.[点击复制]
LUO Ao,XIAO Wen-bin,ZHOU Qi,LU Ren-quan.Optimal control for a class of nonlinear systems with input constraints based on reinforcement learning[J].Control Theory and Technology,2022,39(1):154~164.[点击复制]
基于强化学习的一类具有输入约束非线性系统最优控制
Optimal control for a class of nonlinear systems with input constraints based on reinforcement learning
摘要点击 3241  全文点击 946  投稿时间:2020-12-14  修订日期:2021-06-25
查看全文  查看/发表评论  下载PDF阅读器
DOI编号  10.7641/CTA.2021.00898
  2022,39(1):154-164
中文关键词  输入约束  不可测状态  最优控制  强化学习  反步法
英文关键词  input constraints  immeasurable states  optimal control  reinforcement learning  backstepping
基金项目  国家自然科学基金项目(62121004, 61973091),“广东特支计划”本土创新创业团队项目(2019BT02X353), 广东省重点领域研发计划项目 (2021B0101410005)资助.
作者单位邮编
罗傲 广东工业大学 510006
肖文彬 广东工业大学 
周琪* 广东工业大学 510006
鲁仁全 广东工业大学 
中文摘要
      针对部分系统存在输入约束和不可测状态的最优控制问题, 本文将强化学习中基于执行–评价结构的近似 最优算法与反步法相结合, 提出了一种最优跟踪控制策略. 首先, 利用神经网络构造非线性观测器估计系统的不可 测状态. 然后, 设计一种非二次型效用函数解决系统的输入约束问题. 相比现有的最优方法, 本文提出的最优跟踪 控制方法不仅具有反步法在处理n阶系统跟踪问题上的优势, 而且保证了所有虚拟控制器均为最优, 同时, 该方法 可以简化控制器设计过程. 最后, 基于李雅普诺夫稳定性理论, 证明了闭环系统中的所有信号一致最终有界. 通过 仿真结果验证该方法的有效性.
英文摘要
      In this paper, by incorporating the approximate optimization algorithm, which is derived from actor-critic structure in reinforcement learning, into the backstepping, an optimal tracking control strategy is proposed for a class of nonlinear systems with immeasurable states and input constraints. First, a nonlinear observer is constructed with neural network to estimate the immeasurable states. Then, a non-quadratic cost function is designed to solve the problem of controller constraints. Compared with the existing optimization methods, the optimal tracking control method proposed in this paper not only has the advantage of backstepping technique in addressing the n-order system tracking problem, but also ensures that all virtual controllers are optimal. And this method simplifies the controller design. Finally, according to Lyapunov stability theory, it is proven that all signals in the closed-loop system are uniformly ultimately bounded. The effectiveness of the proposed method is verified by the simulation results.