Hostname: page-component-745bb68f8f-5r2nc Total loading time: 0 Render date: 2025-01-15T16:06:10.215Z Has data issue: false hasContentIssue false

New discount and average optimality conditions for continuous-time Markov decision processes

Published online by Cambridge University Press:  01 July 2016

Xianping Guo*
Affiliation:
Sun Yat-Sen University
Liuer Ye*
Affiliation:
Sun Yat-Sen University
*
Postal address: School of Mathematics and Computational Science, Sun Yat-Sen University, Guangzhou, 510275, P. R. China.
Postal address: School of Mathematics and Computational Science, Sun Yat-Sen University, Guangzhou, 510275, P. R. China.
Rights & Permissions [Opens in a new window]

Abstract

Core share and HTML view are not available for this content. However, as you have access to this content, a full PDF is available via the ‘Save PDF’ action button.

This paper deals with continuous-time Markov decision processes in Polish spaces, under the discounted and average cost criteria. All underlying Markov processes are determined by given transition rates which are allowed to be unbounded, and the costs are assumed to be bounded below. By introducing an occupation measure of a randomized Markov policy and analyzing properties of occupation measures, we first show that the family of all randomized stationary policies is ‘sufficient’ within the class of all randomized Markov policies. Then, under the semicontinuity and compactness conditions, we prove the existence of a discounted cost optimal stationary policy by providing a value iteration technique. Moreover, by developing a new average cost, minimum nonnegative solution method, we prove the existence of an average cost optimal stationary policy under some reasonably mild conditions. Finally, we use some examples to illustrate applications of our results. Except that the costs are assumed to be bounded below, the conditions for the existence of discounted cost (or average cost) optimal policies are much weaker than those in the previous literature, and the minimum nonnegative solution approach is new.

Type
General Applied Probability
Copyright
Copyright © Applied Probability Trust 2010 

Footnotes

Research supported by the NSFC and GDUPS (2010).

References

Anderson, W. J. (1991). Continuous-Time Markov Chains. Springer, New York.Google Scholar
Chen, M.-F. (2004). From Markov Chains to Non-Equilibrium Particle Systems, 2nd edn. World Scientific, River Edge, NJ.Google Scholar
Doshi, B. T. (1976). Continuous time control of Markov processes on an arbitrary state space: discounted rewards. Ann. Statist. 4, 12191235.Google Scholar
Feinberg, E. A. (2004). Continuous time discounted Jump Markov decision processes: a discrete-event approach. Math. Operat. Res. 29, 492524.Google Scholar
Feller, W. (1940). On the integro-differential equations of purely discontinuous Markoff processes. Trans. Amer. Math. Soc. 48, 488515.Google Scholar
Guo, X. P. (2007). Constrained optimization for average cost continuous-time Markov decision processes. IEEE Trans. Automatic Control 52, 11391143.Google Scholar
Guo, X. P. (2007). Continuous time Markov decision processes with discounted rewards: the case of Polish spaces. Math. Operat. Res. 32, 7387.Google Scholar
Guo, X. P. and Hernández-Lerma, O. (2003). Continuous-time controlled Markov chains. Ann. Appl. Prob. 13, 363388.Google Scholar
Guo, X. P. and Hernández-Lerma, O. (2003). Drift and monotonicity conditions for continuous-time controlled Markov chains with an average criterion. IEEE Trans. Automatic Control 48, 236245.Google Scholar
Guo, X. P. and Rieder, U. (2006). Average optimality for continuous-time Markov decision processes in Polish spaces. Ann. Appl. Prob. 16, 730756.CrossRefGoogle Scholar
Guo, X. P., Hernández-Lerma, O. and Prieto-Rumeau, T. (2006). A survey of resent results on continuous-time Markov decision processes. TOP 14, 177261.CrossRefGoogle Scholar
Hernández-Lerma, O. and Govindan, T. E. (2001). Nonstationary continuous-time Markov control processes with discounted costs on infinite horizon. Acta Appl. Math. 67, 277293.CrossRefGoogle Scholar
Hernández-Lerma, O. and Lasserre, J. B. (1996). Discrete-Time Markov Control Processes. Springer, New York.Google Scholar
Hernández-Lerma, O. and Lasserre, J. B. (1999). Further Topics on Discrete-Time Markov Control Processes. Springer, New York.Google Scholar
Kitaev, M. Y. and Rykov, V. V. (1995). Controlled Queueing Systems. CRC Press, Boca Raton, FL.Google Scholar
Lewis, M. E. and Puterman, M. L. (2000). A note on bias optimality in controlled queueing systems. J. Appl. Prob. 37, 300305.Google Scholar
Lund, R. B., Meyn, S. P. and Tweedie, R. L. (1996). Computable exponential convergence rates for stochastically ordered Markov processes. Ann. Appl. Prob. 6, 218237.Google Scholar
Prieto-Rumeau, T. and Hernández-Lerma, O. (2006). Bias optimality for continuous-time controlled Markov chains. SIAM J. Control Optimization 45, 5173.Google Scholar
Puterman, M. L. (1994). Markov Decision Processes: Discrete Stochastic Dynamic Programming. John Wiley, New York.Google Scholar
Sennott, L. I. (1999). Stochastic Dynamic Programming and the Control of Queueing Systems. John Wiley, New York.Google Scholar
Wang, Z. K. and Yang, X. Q. (1992). Birth and Death Processes and Markov Chains. Science Press, Beijing.Google Scholar
Ye, L., Guo, X. P. and Hernández-Lerma, O. (2008). Existence and regularity of a nonhomogeneous transition matrix under measurability conditions. J. Theoret. Prob. 21, 604627.Google Scholar
Ye, L. E. and Guo, X. P. (2010). Construction and regularity of transition functions on Polish spaces under measurablity conditions. Submitted.Google Scholar
Zhu, Q. X. (2007). Average optimality inequality for continuous-time Markov decision processes in Polish spaces. Math. Meth. Operat. Res. 66, 299313.Google Scholar
Zhu, Q. X. (2008). Average optimality for continuous-time Markov decision processes with a policy iteration approach. J. Math. Anal. Appl. 339, 691704.Google Scholar