Hostname: page-component-cd9895bd7-gbm5v Total loading time: 0 Render date: 2024-12-23T05:33:26.312Z Has data issue: false hasContentIssue false

Note on discounted continuous-time Markov decision processes with a lower bounding function

Published online by Cambridge University Press:  30 November 2017

Xin Guo*
Affiliation:
University of Liverpool
Alexey Piunovskiy*
Affiliation:
University of Liverpool
Yi Zhang*
Affiliation:
University of Liverpool
*
* Postal address: Department of Mathematical Sciences, University of Liverpool, Liverpool L69 7ZL, UK.
* Postal address: Department of Mathematical Sciences, University of Liverpool, Liverpool L69 7ZL, UK.
* Postal address: Department of Mathematical Sciences, University of Liverpool, Liverpool L69 7ZL, UK.

Abstract

We consider the discounted continuous-time Markov decision process (CTMDP), where the negative part of each cost rate is bounded by a drift function, say w, whereas the positive part is allowed to be arbitrarily unbounded. Our focus is on the existence of a stationary optimal policy for the discounted CTMDP problems out of the more general class. Both constrained and unconstrained problems are considered. Our investigations are based on the continuous-time version of the Veinott transformation. This technique has not been widely employed in the previous literature on CTMDPs, but it clarifies the roles of the imposed conditions in a rather transparent way.

Type
Research Papers
Copyright
Copyright © Applied Probability Trust 2017 

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

[1] Altman, E. (1999). Constrained Markov Decision Processes. Chapman & Hall/CRC, Boca Raton, FL. Google Scholar
[2] Anderson, W. J. (1991). Continuous-Time Markov Chains. Springer, New York. Google Scholar
[3] Báuerle, N. and Rieder, U. (2011). Markov Decision Processes with Applications to Finance. Springer, Heidelberg. Google Scholar
[4] Blok, H. and Spieksma, F. M. (2015). Countable state Markov decision processes with unbounded jump rates and discounted cost: optimality equation and approximations. Adv. Appl. Prob. 47, 10881107. Google Scholar
[5] Chen, A., Pollett, P., Zhang, H. and Cairns, B. (2005). Uniqueness criteria for continuous-time Markov chains with general transition structures. Adv. Appl. Prob. 37, 10561074. CrossRefGoogle Scholar
[6] Chen, M. (2015). Practical criterion for uniqueness of Q-processes. Chinese J. Appl. Prob. Statist. 31, 213224. Google Scholar
[7] Costa, O. L. V. and Dufour, F. (2015). A linear programming formulation for constrained discounted continuous control for piecewise deterministic Markov processes. J. Math. Anal. Appl. 424, 892914. Google Scholar
[8] Dufour, F. and Piunovskiy, A. B. (2013). The expected total cost criterion for Markov decision processes under constraints. Adv. Appl. Prob. 45, 837859. Google Scholar
[9] Dufour, F. and Prieto-Rumeau, T. (2016). Conditions for the solvability of the linear programming formulation for constrained discounted Markov decision processes. Appl. Math. Optimization 74, 2751. Google Scholar
[10] Dufour, F., Horiguchi, M. and Piunovskiy, A. B. (2012). The expected total cost criterion for Markov decision processes under constraints: a convex analytic approach. Adv. Appl. Prob. 44, 774793. CrossRefGoogle Scholar
[11] Dynkin, E. B. and Yushkevich, A. A. (1979). Controlled Markov Processes. Springer, Berlin. Google Scholar
[12] Feinberg, E. A. (2004). Continuous time discounted jump Markov decision processes: a discrete-event approach. Math. Operat. Res. 29, 492524. Google Scholar
[13] Feinberg, E. A. (2012). Reduction of discounted continuous-time MDPs with unbounded jump and reward rates to discrete-time total-reward MDPs. In Optimization, Control, and Applications of Stochastic Systems, Birkhäuser, New York, pp. 7797. Google Scholar
[14] Feinberg, E. A. and Rothblum, U. G. (2012). Splitting randomized stationary policies in total-reward Markov decision processes. Math. Operat. Res. 37, 129153. Google Scholar
[15] Feinberg, E. A. and Sonin, I. M. (1996). Notes on equivalent stationary policies in Markov decision processes with total rewards. Math. Meth. Operat. Res. 44, 205221. Google Scholar
[16] Feinberg, E. A., Mandava, M. and Shiryaev, A. N. (2013). Sufficiency of Markov policies for continuous-time Markov decision processes and solutions to Kolmogorov's forward equation for jump Markov processes. In Proc. 52nd IEEE Conference on Decision and Control, IEEE, pp. 57285732. Google Scholar
[17] Feinberg, E. A., Mandava, M. and Shiryaev, A. N. (2014). On solutions of Kolmogorov's equations for nonhomogeneous jump Markov processes. J. Math. Anal. Appl. 411, 261270. Google Scholar
[18] Feinberg, E., Mandava, M. and Shiryaev, A. N. (2017). Kolmogorov's equations for jump Markov processes with unbounded jump rates. To appear in Ann. Operat. Res. Available at https://doi.org/10.1007/s10479-017-2538-8. Google Scholar
[19] Gīhman, Ĭ. Ī. and Skorohod, A. V. (1975). The Theory of Stochastic Processes. II. Springer, New York. Google Scholar
[20] Guo, X. (2007). Continuous-time Markov decision processes with discounted rewards: the case of Polish spaces. Math. Operat. Res. 32, 7387. Google Scholar
[21] Guo, X. and Hernández-Lerma, O. (2009). Continuous-Time Markov Decision Processes: Theory and Applications. Springer, Berlin. Google Scholar
[22] Guo, X. and Zhang, Y. (2017). Constrained total undiscounted continuous-time Markov decision processes. Bernoulli 23, 16941736. Google Scholar
[23] Hernández-Lerma, O. and Lasserre, J. B. (1996). Discrete-Time Markov Control Processes. Springer, New York. Google Scholar
[24] Jacod, J. (1975). Multivariate point processes: predictable projection, Radon–Nykodým derivatives, representation of martingales. Z. Wahrscheinlichkeitsth. 31, 235253. CrossRefGoogle Scholar
[25] Jaśkiewicz, A. and Nowak, A. S. (2011). Discounted dynamic programming with unbounded returns: application to economic models. J. Math. Anal. Appl. 378, 450462. Google Scholar
[26] Jaśkiewicz, A. and Nowak, A. S. (2011). Stochastic games with unbounded payoffs: applications to robust control in economics. Dyn. Games Appl. 1, 253279. Google Scholar
[27] Kitaev, M. Y. and Rykov, V. V. (1995). Controlled Queueing Systems. CRC, Boca Raton, FL. Google Scholar
[28] Kuznetsov, S. E. (1984). Inhomogeneous Markov processes. J. Soviet Math. 25, 13801498. Google Scholar
[29] Le Van, C. and Morhaim, L. (2002). Optimal growth models with bounded or unbounded returns: a unifying approach. J. Econom. Theory 105, 158187. Google Scholar
[30] Miller, B. L. (1968). Finite state continuous time Markov decision processes with an infinite planning horizon. J. Math. Anal. Appl. 22, 552569. Google Scholar
[31] Piunovskiy, A. B. (1997). Optimal Control of Random Sequences in Problems with Constraints. Kluwer, Dordrecht. CrossRefGoogle Scholar
[32] Piunovskiy, A. (2015). Randomized and relaxed strategies in continuous-time Markov decision processes. SIAM J. Control Optimization 53, 35033533. Google Scholar
[33] Piunovskiy, A. and Zhang, Y. (2011). Discounted continuous-time Markov decision processes with unbounded rates: the convex analytic approach. SIAM J. Control Optimization 49, 20322061. Google Scholar
[34] Piunovskiy, A. and Zhang, Y. (2011). Discounted continuous-time Markov decision processes with unbounded rates: the dynamic programming approach. Preprint. Available at https://arxiv.org/abs/1103.0134. Google Scholar
[35] Prieto-Rumeau, T. and Hernández-Lerma, O. (2012). Selected Topics on Continuous-Time Controlled Markov Chains and Markov Games. World Scientific, London. Google Scholar
[36] Rykov, V. V. (1966). Markov decision processes with finite state and decision spaces. Theory Prob. Appl. 11, 302311. Google Scholar
[37] Spieksma, F. M. (2015). Countable state Markov processes: non-explosiveness and moment function. Prob. Eng. Inf. Sci. 29, 623637. Google Scholar
[38] Spieksma, F. M. (2016). Kolmogorov forward equation and explosiveness in countable state Markov processes. Ann. Operat. Res. 241, 322. Google Scholar
[39] Van der Wal, J. (1981). Stochastic Dynamic Programming: Successive Approximations and Nearly Optimal Strategies for Markov Decision Processes and Markov Games. Mathematisch Centrum, Amsterdam. Google Scholar
[40] Veinott, A. F., Jr. (1969). Discrete dynamic programming with sensitive discount optimality criteria. Ann. Math. Statist. 40, 16351660. Google Scholar
[41] Zhang, Y. (2017). On the nonexplosion and explosion for nonhomogeneous Markov pure jump processes. To appear in J. Theoret. Prob. Available at https://doi.org/10.1007/s10959-017-0763-3. Google Scholar