一般和博弈中的合作多agent学习

宋梅萍; 顾国昌; 张国印; 刘海波

引用本文:	宋梅萍, 顾国昌, 张国印, 刘海波.一般和博弈中的合作多agent学习[J].控制理论与应用,2007,24(2):317~321.[点击复制]
	SONG Mei-ping, GU Guo-chang, ZHANG Guo-yin, LIU Hai-bo.Multi-agent learning in cooperative general-sum games[J].Control Theory and Technology,2007,24(2):317~321.[点击复制]

一般和博弈中的合作多agent学习

Multi-agent learning in cooperative general-sum games

摘要点击 4052 全文点击 910 投稿时间：2005-07-11 修订日期：2006-05-18

查看全文查看/发表评论下载PDF阅读器

DOI编号 10.7641/j.issn.1000-8152.2007.2.029

2007,24(2):317-321

中文关键词多agent学习一般和随机博弈 Nash平衡 Pareto占优 Q-学习

英文关键词 multi-agent learning general-sum game Nash equilibrium Pareto optimum Q-learning

基金项目

作者	单位
宋梅萍, 顾国昌, 张国印, 刘海波	哈尔滨工程大学计算机科学与技术学院, 黑龙江哈尔滨 150001

中文摘要

理性和收敛是多agent学习研究所追求的目标.在理性合作的agent系统中提出利用Pareto占优解代替非合作的Nash平衡解进行学习,使agent更具理性.另一方面引入社会公约来启动和约束agent的推理,统一系统中所有agent的决策,从而保证学习的收敛性.利用2人栅格游戏对多种算法进行验证,成功率的比较说明了所提算法具有较好的学习性能.

英文摘要

Rationality and convergence are two topics in the research on multi-agent learning.A new method called Pareto-Q is proposed with the concept of Pareto optimum, which is more rational than Nash equilibrium with regard to the cooperative system. At the same time, social conventions are also introduced to promise the convergence of learning. When tested on a two-person grid game, the algorithm performs better than the single Q-learning and Nash-Q learning.