quotation: | | [Copy] |
| | Masahiko SAKAGUCHI,Yoshio OHTSUBO.[en_title][J].Control Theory and Technology,2013,11(4):548~557.[Copy] |
|
|
|
This Paper:Browse 2105 Download 50 |
 码上扫一扫! |
|
MasahikoSAKAGUCHI,YoshioOHTSUBO |
|
(Department of Mathematics, Faculty of Science, Kochi University) |
|
摘要: |
|
关键词: |
DOI: |
Received:August 21, 2012Revised:May 28, 2013 |
基金项目: |
|
Markov decision processes associated with two threshold probability criteria |
Masahiko SAKAGUCHI,Yoshio OHTSUBO |
(Department of Mathematics, Faculty of Science, Kochi University,) |
Abstract: |
This paper deals with Markov decision processes with a target set for nonpositive rewards. Two types of threshold probability criteria are discussed. The first criterion is a probability that a total reward is not greater than a given initial threshold value, and the second is a probability that the total reward is less than it. Our first (resp. second) optimizing problem is to minimize the first (resp. second) threshold probability. These problems suggest that the threshold value is a permissible level of the total reward to reach a goal (the target set), that is, we would reach this set over the level, if possible. For the both problems, we show that 1) the optimal threshold probability is a unique solution to an optimality equation, 2) there exists an optimal deterministic stationary policy, and 3) a value iteration and a policy space iteration are given. In addition, we prove that the first (resp. second) optimal threshold probability is a monotone increasing and right (resp. left) continuous function of the initial threshold value and propose a method to obtain an optimal policy and the optimal threshold probability in the first problem by using them in the second problem. |
Key words: Markov decision process Minimizing risk model Threshold probability Policy space iteration |
|
|
|
|
|