引用本文: | 唐 昊,奚宏生,殷保群.Markov控制过程基于单个样本轨道的在线优化算法[J].控制理论与应用,2002,19(6):865~871.[点击复制] |
TANG Hao,XI Hong-sheng,YIN Bao-qun.On-line optimization algorithm for Markov control processes based on a single sample path[J].Control Theory and Technology,2002,19(6):865~871.[点击复制] |
|
Markov控制过程基于单个样本轨道的在线优化算法 |
On-line optimization algorithm for Markov control processes based on a single sample path |
摘要点击 2487 全文点击 2531 投稿时间:2001-05-14 修订日期:2001-11-13 |
查看全文 查看/发表评论 下载PDF阅读器 |
DOI编号 10.7641/j.issn.1000-8152.2002.6.010 |
2002,19(6):865-871 |
中文关键词 Markov控制过程 Markov性能势 随机平稳策略 在线优化 |
英文关键词 Markov control processes Markov performance potentials randomized stationary policies on-line optimization |
基金项目 国家自然科学基金(69974037); 国家高性能计算基金(00208)资助项目. |
|
中文摘要 |
在Markov性能势理论基础上, 研究了Markov控制过程的性能优化算法. 不同于传统的基于计算的方法, 文中的算法是根据单个样本轨道的仿真来估计性能指标关于策略参数的梯度, 以寻找最优 (或次优 )随机平稳策略. 由于可根据不同实际系统的特征来选择适当的算法参数, 因此它能满足不同实际工程系统在线优化的需要. 最后简要分析了这些算法在一个无限长的样本轨道上以概率 1的收敛性, 并给出了一个三 状态受控Markov过程的数值实例. |
英文摘要 |
Based on the theory of Markov performance potentials, this paper studies a performance optimization algorithm for Markov control processes. Different from the traditional computation-based approaches, this algorithm could estimate the gradients of performance with respect to the policy parameters by simulating a single sample path, and look for an optimal (or suboptimal) randomized stationary policy. The algorithm provided here could satisfy the needs of on-line optimization of many different real-world engineering systems, because we can select suitable parameters in the algorithm according to the properties of a real system. Finally, the convergence of the algorithm with probability one on an infinite sample path is considered, and a numerical example for a three-state controlled Markov chain is provided. |