Hostname: page-component-cd9895bd7-7cvxr Total loading time: 0 Render date: 2024-12-27T02:02:28.943Z Has data issue: false hasContentIssue false

Computable approximations for average Markov decision processes in continuous time

Published online by Cambridge University Press:  26 July 2018

Jonatha Anselmi*
Affiliation:
INRIA
François Dufour*
Affiliation:
INRIA and Université de Bordeaux
Tomás Prieto-Rumeau*
Affiliation:
UNED
*
* Postal address: INRIA Bordeaux Sud Ouest, Bureau B430, 200 av. de la Vieille Tour, 33405 Talence cedex, France. Email address: [email protected]
** Postal address: Institut de Mathématiques de Bordeaux, Université de Bordeaux, 351 cours de la Libération, F33405 Talence, France. Email address: [email protected]
*** Postal address: Statistics Department, UNED, calle Senda del Rey 9, 28040 Madrid, Spain. Email address: [email protected]

Abstract

In this paper we study the numerical approximation of the optimal long-run average cost of a continuous-time Markov decision process, with Borel state and action spaces, and with bounded transition and reward rates. Our approach uses a suitable discretization of the state and action spaces to approximate the original control model. The approximation error for the optimal average reward is then bounded by a linear combination of coefficients related to the discretization of the state and action spaces, namely, the Wasserstein distance between an underlying probability measure μ and a measure with finite support, and the Hausdorff distance between the original and the discretized actions sets. When approximating μ with its empirical probability measure we obtain convergence in probability at an exponential rate. An application to a queueing system is presented.

Type
Research Papers
Copyright
Copyright © Applied Probability Trust 2018 

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

[1]Anselmi, J., Dufour, F. and Prieto-Rumeau, T. (2016). Computable approximations for continuous-time Markov decision processes on Borel spaces based on empirical measures. J. Math. Anal. Appl. 443, 13231361. Google Scholar
[2]Bertsekas, D. P. and Tsitsiklis, J. N. (1996). Neuro-Dynamic Programming. Athena Scientific, Belmont, MA. Google Scholar
[3]Boissard, E. (2011). Simple bounds for convergence of empirical and occupation measures in 1-Wasserstein distance. Electron. J. Prob. 16, 22962333. 10.1214/EJP.v16-958Google Scholar
[4]Chang, H. S., Fu, M. C., Hu, J. and Marcus, S. I. (2007). Simulation-Based Algorithms for Markov Decision Processes. Springer, London. Google Scholar
[5]Dufour, F. and Prieto-Rumeau, T. (2012). Approximation of Markov decision processes with general state space. J. Math. Anal. Appl. 388, 12541267. Google Scholar
[6]Dufour, F. and Prieto-Rumeau, T. (2013). Finite linear programming approximations of constrained discounted Markov decision processes. SIAM J. Control Optimization 51, 12981324. Google Scholar
[7]Dufour, F. and Prieto-Rumeau, T. (2014). Stochastic approximations of constrained discounted Markov decision processes. J. Math. Anal. Appl. 413, 856879. Google Scholar
[8]Dufour, F. and Prieto-Rumeau, T. (2015). Approximation of average cost Markov decision processes using empirical distributions and concentration inequalities. Stochastics 87, 273307. Google Scholar
[9]Guo, X. and Rieder, U. (2006). Average optimality for continuous-time Markov decision processes in Polish spaces. Ann. Appl. Prob. 16, 730756. Google Scholar
[10]Guo, X. and Ye, L. (2010). New discount and average optimality conditions for continuous-time Markov decision processes. Adv. Appl. Prob. 42, 953985. 10.1239/aap/1293113146Google Scholar
[11]Guo, X. and Zhang, W. (2014). Convergence of controlled models and finite-state approximation for discounted continuous-time Markov decision processes with constraints. Europ. J. Operat. Res. 238, 486496. Google Scholar
[12]Hinderer, K. (2005). Lipschitz continuity of value functions in Markovian decision processes. Math. Meth. Operat. Res. 62, 322. Google Scholar
[13]Jacod, J. (1979). Calcul Stochastique et Problèmes de Martingales (Lecture Notes Math. 714). Springer, Berlin. Google Scholar
[14]Piunovskiy, A. and Zhang, Y. (2014). Discounted continuous-time Markov decision processes with unbounded rates and randomized history-dependent policies: the dynamic programming approach. 4OR 12, 4975. Google Scholar
[15]Powell, W. B. (2007). Approximate Dynamic Programming. John Wiley, Hoboken, NJ. Google Scholar
[16]Prieto-Rumeau, T. and Hernández-Lerma, O. (2012). Discounted continuous-time controlled Markov chains: convergence of control models. J. Appl. Prob. 49, 10721090. Google Scholar
[17]Prieto-Rumeau, T. and Lorenzo, J. M. (2010). Approximating ergodic average reward continuous-time controlled Markov chains. IEEE Trans. Automatic Control 55, 201207. Google Scholar
[18]Saldi, N., Linder, T. and Yüksel, S. (2015). Asymptotic optimality and rates of convergence of quantized stationary policies in stochastic control. IEEE Trans. Automatic Control 60, 553558. Google Scholar
[19]Sutton, R. S. and Barto, A. G. (1998). Reinforcement Learning: An Introduction. MIT Press, Cambridge, MA. Google Scholar
[20]Van Roy, B. (2002). Neuro-dynamic programming: overview and recent trends. In Handbook of Markov Decision Processes (Internat. Ser. Operat. Res. Manag. Sci. 40), Kluwer, Boston, MA, pp. 431459. Google Scholar
[21]Zhang, Y. (2014). Average optimality for continuous-time Markov decision processes under weak continuity conditions. J. Appl. Prob. 51, 954970. Google Scholar