Hostname: page-component-cd9895bd7-gxg78 Total loading time: 0 Render date: 2024-12-25T13:38:32.349Z Has data issue: false hasContentIssue false

An analysis of transient Markov decision processes

Published online by Cambridge University Press:  14 July 2016

Huw W. James*
Affiliation:
University of Bristol
E. J. Collins*
Affiliation:
University of Bristol
*
Current address: Commerzbank Corporates and Markets, 60 Gracechurch Street, London EC3V 0HR, UK. Email address: [email protected]
∗∗Postal address: School of Mathematics, University of Bristol, University Walk, Bristol BS8 1TW, UK.
Rights & Permissions [Opens in a new window]

Abstract

Core share and HTML view are not available for this content. However, as you have access to this content, a full PDF is available via the ‘Save PDF’ action button.

This paper is concerned with the analysis of Markov decision processes in which a natural form of termination ensures that the expected future costs are bounded, at least under some policies. Whereas most previous analyses have restricted attention to the case where the set of states is finite, this paper analyses the case where the set of states is not necessarily finite or even countable. It is shown that all the existence, uniqueness, and convergence results of the finite-state case hold when the set of states is a general Borel space, provided we make the additional assumption that the optimal value function is bounded below. We give a sufficient condition for the optimal value function to be bounded below which holds, in particular, if the set of states is countable.

Type
Research Article
Copyright
© Applied Probability Trust 2006 

References

Bertsekas, D. P. and Tsitsiklis, J. N. (1989). Parallel and Distributed Computation: Numerical Methods. Prentice-Hall, Englewood Cliffs, NJ.Google Scholar
Bertsekas, D. P. and Tsitsiklis, J. N. (1991). An analysis of stochastic shortest path problems. Math. Operat. Res. 16, 580595.Google Scholar
Blackwell, D. (1967). Positive dynamic programming. In Proc. 5th Berkeley Symp. Math. Statist. Prob. (Berkeley, CA, 1965/66), Vol. 1, University of California Press, Berkeley, pp. 415418.Google Scholar
Derman, C. (1970). Finite State Markovian Decision Processes. Academic Press, New York.Google Scholar
Dynkin, E. B. (1963). The optimum choice of the instant for stopping a Markov process. Soviet Math. Dokl. 150, 238240.Google Scholar
Eaton, J. H. and Zadeh, L. A. (1962). Optimal pursuit strategies in discrete-state probabilistic systems. Trans. ASME Ser. D J. Basic Eng. 84, 2329.Google Scholar
Grigelionis, R. I. and Shiryaev, A. N. (1966). On Stefan's problem and optimal stopping rules for Markov processes. Theory Prob. Appl. 11, 541558.Google Scholar
Hernández-Lerma, O. and Lasserre, J. B. (2000). Fatou's lemma and Lebesgue's convergence theorem for measures. J. Appl. Math. Stoch. Anal. 13, 137146.Google Scholar
Hernández-Lerma, O. and {Muñoz, de Ozak, M.} (1992). Discrete-time MDPs with discounted unbounded costs: optimality criteria. Kybernetika 528, 191212.Google Scholar
Hernández-Lerma, O., Carrasco, G. and Pérez-Hernández, R. (1999). Markov control processes with the expected total cost criterion: optimality, stability, and transient models. Acta Appl. Math. 59, 229269.Google Scholar
Himmelberg, C. J., Parthasarathy, T. and Van Vleck, F. S. (1976). Optimal plans for dynamic programming problems. Math. Operat. Res. 1, 390394.Google Scholar
Hinderer, K. (1970). Foundations of Non-Stationary Dynamic Programming with Discrete Time Parameter. Springer, New York.Google Scholar
Hinderer, K. and Waldmann, K. H. (2003). The critical discount factor for finite Markovian decision processes with an absorbing set. Math. Meth. Operat. Res. 57, 119.Google Scholar
Hinderer, K. and Waldmann, K. H. (2005). Algorithms for countable state Markov decision models with an absorbing set. SIAM J. Control Optimization 43, 21092131.Google Scholar
Ionescu Tulcea, C. T., (1949). Measures dans les espaces produits. Atti Accad. Naz. Lincei. Rende. Cl. Sci. Fis. Mat. Nat. (8) 7, 208211.Google Scholar
Pliska, S. R. (1978). On the transient case for Markov decision chains with general state spaces. In Dynamic Programming and Its Applications, ed. Puterman, M. L., Academic Press, New York, pp. 335349.Google Scholar
Schäl, M. (1974). A selection theorem for optimization problems. Arch. Math. (Basel) 25, 219224.Google Scholar
Strauch, R. E. (1966). Negative dynamic programming. Ann. Math. Statist. 37, 871890.Google Scholar
Veinott, A. F. (1969). Discrete dynamic programming with sensitive discount optimality criteria. Ann. Math. Statist. 40, 16351660.Google Scholar