引用本文:李梦花,王鼎,乔俊飞.不对称约束多人非零和博弈的自适应评判控制[J].控制理论与应用,2023,40(9):1562~1568.[点击复制]
LI Meng-hua,WANG Ding,QIAO Jun-fei.Adaptive critic control for multi-player non-zero-sum games with asymmetric constraints[J].Control Theory and Technology,2023,40(9):1562~1568.[点击复制]
不对称约束多人非零和博弈的自适应评判控制
Adaptive critic control for multi-player non-zero-sum games with asymmetric constraints
摘要点击 1541  全文点击 506  投稿时间:2022-01-21  修订日期:2023-07-14
查看全文  查看/发表评论  下载PDF阅读器
DOI编号  10.7641/CTA.2022.20063
  2023,40(9):1562-1568
中文关键词  神经网络  自适应评判控制  自适应动态规划  非线性系统  不对称约束  多人非零和博弈
英文关键词  neural networks  adaptive critic control  adaptive dynamic programming  nonlinear systems  asymmetric constraints  multi-player non-zero-sum games
基金项目  科技创新2030 –“新一代人工智能”重大项目(2021ZD0112302, 2021ZD0112301), 国家重点研发计划项目(2018YFC1900800–5), 北京市自然科学 基金项目(JQ19013), 国家自然科学基金项目(62222301, 61890930–5, 62021003)
作者单位E-mail
李梦花 北京工业大学 信息学部 limenghua@emails.bjut.edu.cn 
王鼎 北京工业大学 信息学部  
乔俊飞* 北京工业大学 信息学部 adqiao@bjut.edu.cn 
中文摘要
      本文针对连续时间非线性系统的不对称约束多人非零和博弈问题, 建立了一种基于神经网络的自适应评判控制方法. 首先, 本文提出了一种新颖的非二次型函数来处理不对称约束问题, 并且推导出最优控制律和耦合Hamilton-Jacobi方程. 值得注意的是, 当系统状态为零时, 最优控制策略是不为零的, 这与以往不同. 然后, 通过构建单一评判网络来近似每个玩家的最优代价函数, 从而获得相关的近似最优控制策略. 同时, 在评判学习期间发展了一种新的权值更新规则. 此外, 通过利用Lyapunov理论证明了评判网络权值近似误差和闭环系统状态的稳定性. 最后, 仿真结果验证了本文所提方法的有效性
英文摘要
      In this paper, an adaptive critic control method based on the neural networks is established for multi-player non-zero-sum games with asymmetric constraints of continuous-time nonlinear systems. First, a novel nonquadratic function is proposed to deal with asymmetric constraints, and then the optimal control laws and the coupled Hamilton-Jacobi equations are derived. It is worth noting that the optimal control strategies do not stay at zero when the system state is zero, which is different from the past. After that, only a critic network is constructed to approximate the optimal cost function for each player, so as to obtain the associated approximate optimal control strategies. Meanwhile, a new weight updating rule is developed during critic learning. In addition, the stability of the weight estimation errors of critic networks and the closed-loop system state is proved by utilizing the Lyapunov method. Finally, simulation results verify the effectiveness of the method proposed in this paper