不对称约束多人非零和博弈的自适应评判控制

李梦花; 王鼎; 乔俊飞

引用本文:	李梦花,王鼎,乔俊飞.不对称约束多人非零和博弈的自适应评判控制[J].控制理论与应用,2023,40(9):1562~1568.[点击复制]
	LI Meng-hua,WANG Ding,QIAO Jun-fei.Adaptive critic control for multi-player non-zero-sum games with asymmetric constraints[J].Control Theory and Technology,2023,40(9):1562~1568.[点击复制]

不对称约束多人非零和博弈的自适应评判控制

Adaptive critic control for multi-player non-zero-sum games with asymmetric constraints

摘要点击 1678 全文点击 531 投稿时间：2022-01-21 修订日期：2023-07-14

查看全文查看/发表评论下载PDF阅读器

DOI编号 10.7641/CTA.2022.20063

2023,40(9):1562-1568

中文关键词神经网络自适应评判控制自适应动态规划非线性系统不对称约束多人非零和博弈

英文关键词 neural networks adaptive critic control adaptive dynamic programming nonlinear systems asymmetric constraints multi-player non-zero-sum games

基金项目科技创新2030 –“新一代人工智能”重大项目(2021ZD0112302, 2021ZD0112301), 国家重点研发计划项目(2018YFC1900800–5), 北京市自然科学基金项目(JQ19013), 国家自然科学基金项目(62222301, 61890930–5, 62021003)

作者	单位	E-mail
李梦花	北京工业大学信息学部	limenghua@emails.bjut.edu.cn
王鼎	北京工业大学信息学部
乔俊飞^*	北京工业大学信息学部	adqiao@bjut.edu.cn

中文摘要

本文针对连续时间非线性系统的不对称约束多人非零和博弈问题, 建立了一种基于神经网络的自适应评判控制方法. 首先, 本文提出了一种新颖的非二次型函数来处理不对称约束问题, 并且推导出最优控制律和耦合Hamilton-Jacobi方程. 值得注意的是, 当系统状态为零时, 最优控制策略是不为零的, 这与以往不同. 然后, 通过构建单一评判网络来近似每个玩家的最优代价函数, 从而获得相关的近似最优控制策略. 同时, 在评判学习期间发展了一种新的权值更新规则. 此外, 通过利用Lyapunov理论证明了评判网络权值近似误差和闭环系统状态的稳定性. 最后, 仿真结果验证了本文所提方法的有效性

英文摘要

In this paper, an adaptive critic control method based on the neural networks is established for multi-player non-zero-sum games with asymmetric constraints of continuous-time nonlinear systems. First, a novel nonquadratic function is proposed to deal with asymmetric constraints, and then the optimal control laws and the coupled Hamilton-Jacobi equations are derived. It is worth noting that the optimal control strategies do not stay at zero when the system state is zero, which is different from the past. After that, only a critic network is constructed to approximate the optimal cost function for each player, so as to obtain the associated approximate optimal control strategies. Meanwhile, a new weight updating rule is developed during critic learning. In addition, the stability of the weight estimation errors of critic networks and the closed-loop system state is proved by utilizing the Lyapunov method. Finally, simulation results verify the effectiveness of the method proposed in this paper