基于输出反馈逆强化Q学习的线性二次型最优控制方法

刘文; 范家璐; 薛文倩

引用本文:	刘文,范家璐,薛文倩.基于输出反馈逆强化Q学习的线性二次型最优控制方法[J].控制理论与应用,2024,41(8):1469~1479.[点击复制]
	LIU wen,FAN Jia-lu,XUE Wen-qian.Linear Quadratic Optimal Control Method Based on Output Feedback Inverse Reinforcement Q-Learning[J].Control Theory & Applications,2024,41(8):1469~1479.[点击复制]

基于输出反馈逆强化Q学习的线性二次型最优控制方法

Linear Quadratic Optimal Control Method Based on Output Feedback Inverse Reinforcement Q-Learning

摘要点击 4617 全文点击 193 投稿时间：2022-06-21 修订日期：2024-02-26

查看全文查看/发表评论下载PDF阅读器

DOI编号 10.7641/CTA.2023.20551

2024,41(8):1469-1479

中文关键词逆强化学习, Q学习, 输出反馈, 数据驱动最优控制

英文关键词 inverse reinforcement learning, Q-learning, output feedback, data-driven optimal control

基金项目国家自然科学基金重大项目(61991400) 辽宁省“兴辽英才计划”项目 (XLYC2007135)

作者	单位	E-mail
刘文	东北大学	liuwen144208@163.com
范家璐^*	东北大学	jlfan@mail.neu.edu.cn
薛文倩	东北大学

中文摘要

本文针对模型参数未知且状态不可测的线性离散系统的线性二次型最优控制问题, 提出了一种数据驱动的基于输出反馈逆强化Q学习的最优控制方法, 利用系统的输入输出数据同时确定合适的二次型性能指标权重和最优控制律, 使得系统运行轨迹与参考轨迹一致. 本文首先提出一个参数矫正方程并与逆最优控制相结合得到一种基于模型的逆强化学习最优控制框架, 实现输出反馈控制律参数和性能指标加权项的矫正. 在此基础上, 本文引入强化Q学习思想提出了数据驱动的输出反馈逆强化Q学习最优控制方法, 无需知道系统模型参数, 仅利用历史输入输出数据对输出反馈控制律参数和性能指标加权项进行求解. 理论分析与仿真实验验证了所提方法的有效性

英文摘要

In this paper, a data-driven output feedback optimal control method using inverse reinforcement Q-learning for linear quadratic optimal control problem of linear discrete–time systems with unknown model parameters and unmeasurable states is proposed. Only input and output data are used to adaptively determine the values of appropriate quadratic performance index weights and optimal control law, so that the system exhibits the same trajectories as the reference trajectories. Firstly, an equation for parameter correction is proposed, by combining which with inverse optimal control, a model-based inverse reinforcement learning based optimal control method framework is proposed to compute the correction of the output feedback control law and performance index weights. On this basis, this paper introduces the idea of reinforcement Q-learning and a data-driven output feedback inverse reinforcement Q-learning optimal control method is eventually proposed, which does not require system model parameters, but uses only historical input and output data to solve output feedback control law parameter and performance index weights. The theoretical analysis and simulation experiments are provided to verify the effectiveness of the proposed method.