Hostname: page-component-cd9895bd7-lnqnp Total loading time: 0 Render date: 2024-12-23T18:05:33.742Z Has data issue: false hasContentIssue false

Nonstationary value iteration in controlled Markov chains with risk-sensitive average criterion

Published online by Cambridge University Press:  14 July 2016

Rolando Cavazos-Cadena*
Affiliation:
Universidad Autónoma Agraria Antonio Narro
Raúl Montes-De-Oca*
Affiliation:
Universidad Autónoma Metropolitana
*
Postal address: Departamento de Estadística y Cálculo, Universidad Autónoma Agraria Antonio Narro, Buenavista, Saltillo COAH, 25315, México. Email address: [email protected]
∗∗Postal address: Departamento de Matemáticas, Universidad Autónoma Metropolitana, Campus Iztapalapa, Avenida San Rafael Atlixco #186, Colonia Vicentina, México 09340, D.F. México. Email address: [email protected]
Rights & Permissions [Opens in a new window]

Abstract

Core share and HTML view are not available for this content. However, as you have access to this content, a full PDF is available via the ‘Save PDF’ action button.

This work concerns Markov decision chains with finite state spaces and compact action sets. The performance index is the long-run risk-sensitive average cost criterion, and it is assumed that, under each stationary policy, the state space is a communicating class and that the cost function and the transition law depend continuously on the action. These latter data are not directly available to the decision-maker, but convergent approximations are known or are more easily computed. In this context, the nonstationary value iteration algorithm is used to approximate the solution of the optimality equation, and to obtain a nearly optimal stationary policy.

Type
Research Papers
Copyright
© Applied Probability Trust 2005 

Footnotes

This work was supported by the PSF Organization under grant no. 008/100/03-3.

References

Bielecki, T., Hernández-Hernández, D. and Pliska, S. R. (1999). Risk sensitive control of finite state Markov chains in discrete time, with applications to portfolio management. Math. Meth. Operat. Res. 50, 167188.CrossRefGoogle Scholar
Borkar, V. S. and Meyn, S. P. (2002). Risk-sensitive optimal control for Markov decision processes with monotone cost. Math. Operat. Res. 27, 192209.CrossRefGoogle Scholar
Cavazos-Cadena, R. (1988). Necessary and sufficient conditions for a bounded solution to the optimality equation in average reward Markov decision chains. Systems Control Lett. 10, 7178.CrossRefGoogle Scholar
Cavazos-Cadena, R. (2003). An alternative derivation of Birkhoff's formula for the contraction coefficient of a positive matrix. Linear Algebra Appl. 375, 291297.CrossRefGoogle Scholar
Cavazos-Cadena, R. and Fernández-Gaucherand, E. (2002). Risk-sensitive optimal control in communicating average Markov decision chains. In Modeling Uncertainty, eds Dror, M., L'Ecuyer, P. and Szydarovszky, F., Kluwer, Boston, MA, pp. 515553.CrossRefGoogle Scholar
Cavazos-Cadena, R. and Montes-de-Oca, R. (2003). The value iteration algorithm in risk-sensitive average Markov decision chains with finite state space. Math. Operat. Res. 28, 752776.CrossRefGoogle Scholar
Di Masi, G. B. and Stettner, L. (1999). Risk-sensitive control of discrete-time Markov processes with infinite horizon. SIAM J. Control Optimization 38, 6178.CrossRefGoogle Scholar
Di Masi, G. B. and Stettner, L. (2000). Infinite horizon risk sensitive control of discrete time Markov processes with small risk. Systems Control Lett. 40, 1520.CrossRefGoogle Scholar
Duncan, T. E., Pasik-Duncan, B. and Stettner, L. (2001). Risk sensitive adaptive control of discrete time Markov processes. Prob. Math. Statist. 21, 493512.Google Scholar
Federgruen, A. and Schweitzer, P. J. (1981). Nonstationary Markov decision problems with converging parameters. J. Optimization Theory Appl. 34, 207241.CrossRefGoogle Scholar
Hernández-Lerma, O. (1989). Adaptive Markov Control Processes. Springer, New York.CrossRefGoogle Scholar
Puterman, M. L. (1994). Markov Decision Processes: Discrete Stochastic Dynamic Programming. John Wiley, New York.CrossRefGoogle Scholar
Royden, H. L. (1968). Real Analysis. MacMillan, London.Google Scholar
Schweitzer, P. J. (1971). Iterative solution of the functional equations of undiscounted Markov renewal programming. J. Math. Anal. Appl. 34, 495501.CrossRefGoogle Scholar
Seneta, E. (1981). Non-Negative Matrices and Markov Chains, 2nd edn. Springer, New York.CrossRefGoogle Scholar
Thomas, L. C. (1980). Connectedness conditions for denumerable state Markov decision processes. In Recent Advances in Markov Decision Processes, eds Hartley, R., Thomas, L. C. and White, D. J., Academic Press, New York, pp. 181204.Google Scholar