Nonstationary value iteration in controlled Markov chains with risk-sensitive average criterion

Rolando Cavazos-Cadena; Raúl Montes-De-Oca

doi:10.1239/jap/1134587805

Nonstationary value iteration in controlled Markov chains with risk-sensitive average criterion

Part of: Stochastic systems and control

Published online by Cambridge University Press: 14 July 2016

Rolando Cavazos-Cadena and

Raúl Montes-De-Oca

Show author details

Rolando Cavazos-Cadena*: Affiliation:
Universidad Autónoma Agraria Antonio Narro
Raúl Montes-De-Oca*: Affiliation:
Universidad Autónoma Metropolitana
*: ∗Postal address: Departamento de Estadística y Cálculo, Universidad Autónoma Agraria Antonio Narro, Buenavista, Saltillo COAH, 25315, México. Email address: [email protected]
∗∗Postal address: Departamento de Matemáticas, Universidad Autónoma Metropolitana, Campus Iztapalapa, Avenida San Rafael Atlixco #186, Colonia Vicentina, México 09340, D.F. México. Email address: [email protected]

Article contents

Abstract
Footnotes
References

Rights & Permissions

Abstract

Core share and HTML view are not available for this content. However, as you have access to this content, a full PDF is available via the ‘Save PDF’ action button.

This work concerns Markov decision chains with finite state spaces and compact action sets. The performance index is the long-run risk-sensitive average cost criterion, and it is assumed that, under each stationary policy, the state space is a communicating class and that the cost function and the transition law depend continuously on the action. These latter data are not directly available to the decision-maker, but convergent approximations are known or are more easily computed. In this context, the nonstationary value iteration algorithm is used to approximate the solution of the optimality equation, and to obtain a nearly optimal stationary policy.

Keywords

Nonstationary successive approximations algorithm approximate solution to the optimality equation Schweitzer's transformation Birkhoff's contraction coefficient

MSC classification

Primary: 90C40: Markov and semi-Markov decision processes 93E20: Optimal stochastic control

Type: Research Papers
Information: Journal of Applied Probability , Volume 42 , Issue 4 , December 2005 , pp. 905 - 918

DOI: https://doi.org/10.1239/jap/1134587805 [Opens in a new window]
Copyright: © Applied Probability Trust 2005

Footnotes

This work was supported by the PSF Organization under grant no. 008/100/03-3.

References

Bielecki, T., Hernández-Hernández, D. and Pliska, S. R. (1999). Risk sensitive control of finite state Markov chains in discrete time, with applications to portfolio management. Math. Meth. Operat. Res. 50, 167–188.CrossRef Google Scholar

Borkar, V. S. and Meyn, S. P. (2002). Risk-sensitive optimal control for Markov decision processes with monotone cost. Math. Operat. Res. 27, 192–209.CrossRef Google Scholar

Cavazos-Cadena, R. (1988). Necessary and sufficient conditions for a bounded solution to the optimality equation in average reward Markov decision chains. Systems Control Lett. 10, 71–78.CrossRef Google Scholar

Cavazos-Cadena, R. (2003). An alternative derivation of Birkhoff's formula for the contraction coefficient of a positive matrix. Linear Algebra Appl. 375, 291–297.CrossRef Google Scholar

Cavazos-Cadena, R. and Fernández-Gaucherand, E. (2002). Risk-sensitive optimal control in communicating average Markov decision chains. In Modeling Uncertainty, eds Dror, M., L'Ecuyer, P. and Szydarovszky, F., Kluwer, Boston, MA, pp. 515–553.CrossRef Google Scholar

Cavazos-Cadena, R. and Montes-de-Oca, R. (2003). The value iteration algorithm in risk-sensitive average Markov decision chains with finite state space. Math. Operat. Res. 28, 752–776.CrossRef Google Scholar

Di Masi, G. B. and Stettner, L. (1999). Risk-sensitive control of discrete-time Markov processes with infinite horizon. SIAM J. Control Optimization 38, 61–78.CrossRef Google Scholar

Di Masi, G. B. and Stettner, L. (2000). Infinite horizon risk sensitive control of discrete time Markov processes with small risk. Systems Control Lett. 40, 15–20.CrossRef Google Scholar

Duncan, T. E., Pasik-Duncan, B. and Stettner, L. (2001). Risk sensitive adaptive control of discrete time Markov processes. Prob. Math. Statist. 21, 493–512.Google Scholar

Federgruen, A. and Schweitzer, P. J. (1981). Nonstationary Markov decision problems with converging parameters. J. Optimization Theory Appl. 34, 207–241.CrossRef Google Scholar

Hernández-Lerma, O. (1989). Adaptive Markov Control Processes. Springer, New York.CrossRef Google Scholar

Puterman, M. L. (1994). Markov Decision Processes: Discrete Stochastic Dynamic Programming. John Wiley, New York.CrossRef Google Scholar

Royden, H. L. (1968). Real Analysis. MacMillan, London.Google Scholar

Schweitzer, P. J. (1971). Iterative solution of the functional equations of undiscounted Markov renewal programming. J. Math. Anal. Appl. 34, 495–501.CrossRef Google Scholar

Seneta, E. (1981). Non-Negative Matrices and Markov Chains, 2nd edn. Springer, New York.CrossRef Google Scholar

Thomas, L. C. (1980). Connectedness conditions for denumerable state Markov decision processes. In Recent Advances in Markov Decision Processes, eds Hartley, R., Thomas, L. C. and White, D. J., Academic Press, New York, pp. 181–204.Google Scholar

Article contents

Nonstationary value iteration in controlled Markov chains with risk-sensitive average criterion

Abstract

Keywords

MSC classification

Footnotes

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests