An analysis of transient Markov decision processes

Huw W. James; E. J. Collins

doi:10.1239/jap/1158784933

An analysis of transient Markov decision processes

Part of: Stochastic systems and control

Published online by Cambridge University Press: 14 July 2016

Huw W. James and

E. J. Collins

Show author details

Huw W. James*: Affiliation:
University of Bristol
E. J. Collins*: Affiliation:
University of Bristol
*: ∗Current address: Commerzbank Corporates and Markets, 60 Gracechurch Street, London EC3V 0HR, UK. Email address: [email protected]
∗∗Postal address: School of Mathematics, University of Bristol, University Walk, Bristol BS8 1TW, UK.

Article contents

Abstract
References

Rights & Permissions

Abstract

Core share and HTML view are not available for this content. However, as you have access to this content, a full PDF is available via the ‘Save PDF’ action button.

This paper is concerned with the analysis of Markov decision processes in which a natural form of termination ensures that the expected future costs are bounded, at least under some policies. Whereas most previous analyses have restricted attention to the case where the set of states is finite, this paper analyses the case where the set of states is not necessarily finite or even countable. It is shown that all the existence, uniqueness, and convergence results of the finite-state case hold when the set of states is a general Borel space, provided we make the additional assumption that the optimal value function is bounded below. We give a sufficient condition for the optimal value function to be bounded below which holds, in particular, if the set of states is countable.

Keywords

Pursuit problem first passage problem stochastic shortest path problem value iteration policy iteration

MSC classification

Primary: 90C40: Markov and semi-Markov decision processes

Secondary: 93E20: Optimal stochastic control

Type: Research Article
Information: Journal of Applied Probability , Volume 43 , Issue 3 , September 2006 , pp. 603 - 621

DOI: https://doi.org/10.1239/jap/1158784933 [Opens in a new window]
Copyright: © Applied Probability Trust 2006

References

Bertsekas, D. P. and Tsitsiklis, J. N. (1989). Parallel and Distributed Computation: Numerical Methods. Prentice-Hall, Englewood Cliffs, NJ.Google Scholar

Bertsekas, D. P. and Tsitsiklis, J. N. (1991). An analysis of stochastic shortest path problems. Math. Operat. Res. 16, 580–595.Google Scholar

Blackwell, D. (1967). Positive dynamic programming. In Proc. 5th Berkeley Symp. Math. Statist. Prob. (Berkeley, CA, 1965/66), Vol. 1, University of California Press, Berkeley, pp. 415–418.Google Scholar

Derman, C. (1970). Finite State Markovian Decision Processes. Academic Press, New York.Google Scholar

Dynkin, E. B. (1963). The optimum choice of the instant for stopping a Markov process. Soviet Math. Dokl. 150, 238–240.Google Scholar

Eaton, J. H. and Zadeh, L. A. (1962). Optimal pursuit strategies in discrete-state probabilistic systems. Trans. ASME Ser. D J. Basic Eng. 84, 23–29.Google Scholar

Grigelionis, R. I. and Shiryaev, A. N. (1966). On Stefan's problem and optimal stopping rules for Markov processes. Theory Prob. Appl. 11, 541–558.Google Scholar

Hernández-Lerma, O. and Lasserre, J. B. (2000). Fatou's lemma and Lebesgue's convergence theorem for measures. J. Appl. Math. Stoch. Anal. 13, 137–146.Google Scholar

Hernández-Lerma, O. and {Muñoz, de Ozak, M.} (1992). Discrete-time MDPs with discounted unbounded costs: optimality criteria. Kybernetika 528, 191–212.Google Scholar

Hernández-Lerma, O., Carrasco, G. and Pérez-Hernández, R. (1999). Markov control processes with the expected total cost criterion: optimality, stability, and transient models. Acta Appl. Math. 59, 229–269.Google Scholar

Himmelberg, C. J., Parthasarathy, T. and Van Vleck, F. S. (1976). Optimal plans for dynamic programming problems. Math. Operat. Res. 1, 390–394.Google Scholar

Hinderer, K. (1970). Foundations of Non-Stationary Dynamic Programming with Discrete Time Parameter. Springer, New York.Google Scholar

Hinderer, K. and Waldmann, K. H. (2003). The critical discount factor for finite Markovian decision processes with an absorbing set. Math. Meth. Operat. Res. 57, 1–19.Google Scholar

Hinderer, K. and Waldmann, K. H. (2005). Algorithms for countable state Markov decision models with an absorbing set. SIAM J. Control Optimization 43, 2109–2131.Google Scholar

Ionescu Tulcea, C. T., (1949). Measures dans les espaces produits. Atti Accad. Naz. Lincei. Rende. Cl. Sci. Fis. Mat. Nat. (8) 7, 208–211.Google Scholar

Pliska, S. R. (1978). On the transient case for Markov decision chains with general state spaces. In Dynamic Programming and Its Applications, ed. Puterman, M. L., Academic Press, New York, pp. 335–349.Google Scholar

Schäl, M. (1974). A selection theorem for optimization problems. Arch. Math. (Basel) 25, 219–224.Google Scholar

Strauch, R. E. (1966). Negative dynamic programming. Ann. Math. Statist. 37, 871–890.Google Scholar

Veinott, A. F. (1969). Discrete dynamic programming with sensitive discount optimality criteria. Ann. Math. Statist. 40, 1635–1660.Google Scholar

Article contents

An analysis of transient Markov decision processes

Abstract

Keywords

MSC classification

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests