1 J. Filar, "Variancepenalized Markov decision processes" 14 (14): 147-161, 1989
2 R.-R. Chen, "Value iteration and optimization of multiclass queueing networks" 32 : 65-97, 1999
3 K. Boda, "Time consistent dynamic risk measures" 63 : 169-186, 2005
4 M. Sobel, "The variance of discounted Markov decision processes" 19 : 794-802, 1982
5 C. G. Turvey, "The semi-varianceminimizing hedge ratios" 28 (28): 100-115, 2003
6 F. Brauer, "The Qualitative Theory of Ordinary Differential Equations: An Introduction" Dover Publishers 1989
7 V. S. Borkar, "The ODE method for convergence of stochastic approximation and reinforcement learning" 38 (38): 447-469, 2000
8 M. Bouakiz, "Target-level criterion in Markov decision processes" 86 : 1-15, 1995
9 V. S. Borkar, "Stochastic approximation with twotime scales" 29 : 291-294, 1997
10 R. Cavazos-Cadena, "Solution to risk-sensitive average cost optimality equation in a class of MDPs with finite state space" 57 : 253-285, 2003
1 J. Filar, "Variancepenalized Markov decision processes" 14 (14): 147-161, 1989
2 R.-R. Chen, "Value iteration and optimization of multiclass queueing networks" 32 : 65-97, 1999
3 K. Boda, "Time consistent dynamic risk measures" 63 : 169-186, 2005
4 M. Sobel, "The variance of discounted Markov decision processes" 19 : 794-802, 1982
5 C. G. Turvey, "The semi-varianceminimizing hedge ratios" 28 (28): 100-115, 2003
6 F. Brauer, "The Qualitative Theory of Ordinary Differential Equations: An Introduction" Dover Publishers 1989
7 V. S. Borkar, "The ODE method for convergence of stochastic approximation and reinforcement learning" 38 (38): 447-469, 2000
8 M. Bouakiz, "Target-level criterion in Markov decision processes" 86 : 1-15, 1995
9 V. S. Borkar, "Stochastic approximation with twotime scales" 29 : 291-294, 1997
10 R. Cavazos-Cadena, "Solution to risk-sensitive average cost optimality equation in a class of MDPs with finite state space" 57 : 253-285, 2003
11 R. Porter, "Semivariance and stochastic dominance" 64 : 200-204, 1974
12 X.-R. Cao, "Semi-Markov decision problems and performance sensitivity analysis" 48 (48): 758-768, 2003
13 권우영, "SSPQL: Stochastic Shortest Path-based Q-learning" 제어·로봇·시스템학회 9 (9): 328-338, 2011
14 W. Fleming, "Risksensitive control of finite state machines on an infinte horizon" 35 : 1790-1810, 1997
15 D. Hernandez-Hernandez, "Risksensitive control of Markov processes in countable state space" 29 : 147-155, 1996
16 V. Borkar, "Risk-sensitive optimal control for Markov decision processes with monotone cost" 27 : 192-209, 2002
17 A. E. B. Lim, "Risk-sensitive control with HARA utility" 46 (46): 563-578, 2001
18 T. Bielecki, "Risk-sensitive control of finite state Markov chains in discrete time" 50 : 167-188, 1999
19 G. Di Masi, "Risk-sensitive control of discrete-time Markov processes with infinite horizon" 38 (38): 61-78, 1999
20 R. Howard, "Risk-sensitive MDPs" 18 (18): 356-369, 1972
21 S. J. Bradtke, "Reinforcement learning methods for continuous-time MDPs, In Advances in Neural Information Processing Systems 7" MIT Press 1995
22 A. Gosavi, "Reinforcement learning for long-run average cost" 155 : 654-674, 2004
23 J. Filar, "Percentile perfor mance criteria for limiting average Markov decision processes" 40 : 2-10, 1995
24 E. Seneta, "Non-Negative Matrices and Markov Chains" Springer-Verlag 1981
25 D. P. Bertsekas, "Neuro-Dynamic Programming" Athena 1996
26 C. Wu, "Minimizing risk models in Markov decision processes with policies depending on target values" 231 : 47-67, 1999
27 D. White, "Minimizing a threshold probability in discounted Markov decision processes" 173 : 634-646, 1993
28 J. Estrada, "Mean-semivariance behavior: Downside risk and capital asset pricing" 16 : 169-185, 2007
29 M. L. Puterman, "Markov Decision Processes" Wiley Interscience 1994
30 J. Abounadi, "Learning algorithms for Markov decision processes with average cost" 40 : 681-698, 2001
31 J. Baxter, "Infinite-horizon policygradient estimation" 15 : 319-350, 2001
32 G. Hübner, "Improved procedures for eliminating sub-optimal actions in Markov programming by the use of contraction properties" 257-263, 1978
33 X.-R. Cao, "From perturbation analysis to Markov decision processes and reinforcement learning" 13 : 9-39, 2003
34 D. P. Bertsekas, "Dynamic Programming and Optimal Control, 2nd edition" Athena 2000
35 Qi Jiang, "Dynamic File Grouping for Load Balancing in Streaming Media Clustered Server Systems" 제어·로봇·시스템학회 7 (7): 630-637, 2009
36 K. Chung, "Discounted MDPs: distribution functions and exponential utility maximization" 25 : 49-62, 1987
37 R. Cavazos-Cadena, "Controlled Markov chains with risk-sensitive criteria" 43 : 121-139, 1999
38 E. Altman, "Constrained Markov Decision Processes" CRC Press 1998
39 V. S. Borkar, "Asynchronous stochastic approximation" 36 (36): 840-851, 1998
40 S. Ross, "Applied Probability Models with Optimization Applications" Dover 1992
41 A. Gosavi, "A risk-sensitive approach to total productive maintenance" 42 : 1321-1330, 2006
42 S. Singh, "A policygradient method for semi-Markov decision processes with application to call admission control" 178 (178): 808-818, 2007
43 V. S. Borkar, "A new analog parallel scheme for fixed point computation, part I: Theory" 44 : 351-355, 1997
44 A. Gosavi, "A budget-sensitive approach to scheduling maintenance in a total productive maintenance (TPM)" 23 (23): 46-56, 2011
45 H. C. Tijms, "A First Course in Stochastic Models, 2nd edition" Wiley 2003