quotation:[Copy]
Xiaoxiao Zhao,Peng Yi,Li Li.[en_title][J].Control Theory and Technology,2020,18(4):362~378.[Copy]
【Print page】 【Online reading】【Download 【PDF Full text】 View/Add CommentDownload reader Close

←Previous page|Page Next →

Back Issue    Advanced search

This Paper:Browse 901   Download 54 本文二维码信息
码上扫一扫!
Distributed policy evaluation via inexact ADMM in multi-agent reinforcement learning
XiaoxiaoZhao,PengYi,LiLi
0
(College of Electronic and Information Engineering, Tongji University, Shanghai, 201804, China;Institute of Intelligent Science and Technology, Tongji University, Shanghai, 201203, China)
摘要:
This paper studies a distributed policy evaluation in multi-agent reinforcement learning. Under cooperative settings, each agent only obtains a local reward, while all agents share a common environmental state. To optimize the global return as the sum of local return, the agents exchange information with their neighbors through a communication network. The mean squared projected Bellman error minimization problem is reformulated as a constrained convex optimization problem with a consensus constraint; then, a distributed alternating directions method of multipliers (ADMM) algorithm is proposed to solve it. Furthermore, an inexact step for ADMM is used to achieve efficient computation at each iteration. The convergence of the proposed algorithm is established.
关键词:  Multi-agent system · Reinforcement learning · Distributed optimization · Policy evaluation
DOI:https://doi.org/10.1007/s11768-020-00007-x
基金项目:This work was supported by the National Key Research and Development Program of Science and Technology, China (No. 2018YFB1305304), the Shanghai Science and Technology Pilot Project, China (No. 19511132100), the National Natural Science Foundation, China (No. 51475334), the Shanghai Sailing Program, China (No. 20YF1453000), and the Fundamental Research Funds for the Central Universities, China (No. 22120200048).
Distributed policy evaluation via inexact ADMM in multi-agent reinforcement learning
Xiaoxiao Zhao,Peng Yi,Li Li
(College of Electronic and Information Engineering, Tongji University, Shanghai, 201804, China;Institute of Intelligent Science and Technology, Tongji University, Shanghai, 201203, China)
Abstract:
This paper studies a distributed policy evaluation in multi-agent reinforcement learning. Under cooperative settings, each agent only obtains a local reward, while all agents share a common environmental state. To optimize the global return as the sum of local return, the agents exchange information with their neighbors through a communication network. The mean squared projected Bellman error minimization problem is reformulated as a constrained convex optimization problem with a consensus constraint; then, a distributed alternating directions method of multipliers (ADMM) algorithm is proposed to solve it. Furthermore, an inexact step for ADMM is used to achieve efficient computation at each iteration. The convergence of the proposed algorithm is established.
Key words:  Multi-agent system · Reinforcement learning · Distributed optimization · Policy evaluation