http://chineseinput.net/에서 pinyin(병음)방식으로 중국어를 변환할 수 있습니다.
변환된 중국어를 복사하여 사용하시면 됩니다.
Solving Controlled Markov Set-Chains With Discounting via Multipolicy Improvement
Chang, Hyeong Soo,Chong, Edwin K. P. Institute of Electrical and Electronics Engineers 2007 IEEE transactions on automatic control Vol.52 No.3
<P>We consider Markov decision processes (MDPs) where the state transition probability distributions are not uniquely known, but are known to belong to some intervals-so called 'controlled Markov set-chains'-with infinite-horizon discounted reward criteria. We present formal methods to improve multiple policies for solving such controlled Markov set-chains. Our multipolicy improvement methods follow the spirit of parallel rollout and policy switching for solving MDPs. In particular, these methods are useful for online control of Markov set-chains and for designing policy iteration (PI) type algorithms. We develop a PI-type algorithm and prove that it converges to an optimal policy</P>