quotation:[Copy]
Bo PANG,Tao BIAN,Zhong-Ping JIANG.[en_title][J].Control Theory and Technology,2019,17(1):73~84.[Copy]
【Print page】 【Online reading】【Download 【PDF Full text】 View/Add CommentDownload reader Close

←Previous page|Page Next →

Back Issue    Advanced search

This Paper:Browse 1325   Download 259 本文二维码信息
码上扫一扫!
Adaptive dynamic programming for finite-horizon optimal control of linear time-varying discrete-time systems
BoPANG,TaoBIAN,Zhong-PingJIANG
0
(Control and Networks (CAN) Lab, Department of Electrical and Computer Engineering, Tandon School of Engineering, New York University, Brooklyn, NY 11201, U.S.A.)
摘要:
关键词:  
DOI:https://doi.org/10.1007/s11768-019-8168-8
基金项目:The work of B. Pang and Z.-P. Jiang has been supported in part by the National Science Foundation (No. ECCS-1501044).
Adaptive dynamic programming for finite-horizon optimal control of linear time-varying discrete-time systems
Bo PANG,Tao BIAN,Zhong-Ping JIANG
(Control and Networks (CAN) Lab, Department of Electrical and Computer Engineering, Tandon School of Engineering, New York University, Brooklyn, NY 11201, U.S.A.;Bank of America Merrill Lynch, One Bryant Park, New York, NY 10036, U.S.A.)
Abstract:
This paper studies data-driven learning-based methods for the finite-horizon optimal control of linear time-varying discrete-time systems. First, a novel finite-horizon Policy Iteration (PI) method for linear time-varying discrete-time systems is presented. Its connections with existing infinite-horizon PI methods are discussed. Then, both data-driven off-policy PI and Value Iteration (VI) algorithms are derived to find approximate optimal controllers when the system dynamics is completely unknown. Under mild conditions, the proposed data-driven off-policy algorithms converge to the optimal solution. Finally, the effectiveness and feasibility of the developed methods are validated by a practical example of spacecraft attitude control.
Key words:  Optimal control, time-varying system, adaptive dynamic programming, policy iteration (PI), value iteration (VI)