Hostname: page-component-745bb68f8f-l4dxg Total loading time: 0 Render date: 2025-01-23T17:29:08.974Z Has data issue: false hasContentIssue false

Sample-Path Optimal Stationary Policies in Stable Markov Decision Chains with the Average Reward Criterion

Published online by Cambridge University Press:  30 January 2018

Rolando Cavazos-Cadena*
Affiliation:
Universidad Autónoma Agraria Antonio Narro
Raúl Montes-De-Oca*
Affiliation:
Universidad Autónoma Metropolitana-Iztapalapa
Karel Sladký*
Affiliation:
Institute of Information Theory and Automation
*
Postal address: Departamento de Estadística y Cálculo, Universidad Autónoma Agraria Antonio Narro, Buenavista, Saltillo, COAH, 25315, México.Email address: [email protected]
∗∗ Postal address: Departamento de Matemáticas, Universidad Autónoma Metropolitana, Campus Iztapalapa, Avenida San Rafael Atlixco #186, Colonia Vicentina, México 09340, D. F. México.
∗∗∗ Postal address: Institute of Information Theory and Automation, Pod Vodárenskou věží 4, CZ-182 08, Praha 8, Czech Republic.
Rights & Permissions [Opens in a new window]

Abstract

Core share and HTML view are not available for this content. However, as you have access to this content, a full PDF is available via the ‘Save PDF’ action button.

This paper concerns discrete-time Markov decision chains with denumerable state and compact action sets. Besides standard continuity requirements, the main assumption on the model is that it admits a Lyapunov function ℓ. In this context the average reward criterion is analyzed from the sample-path point of view. The main conclusion is that if the expected average reward associated to ℓ2 is finite under any policy then a stationary policy obtained from the optimality equation in the standard way is sample-path average optimal in a strong sense.

Type
Research Article
Copyright
© Applied Probability Trust 

References

Arapostathis, A. et al. (1993). Discrete-time controlled Markov processes with average cost criterion: a survey. SIAM J. Control Optimization 31, 282344.CrossRefGoogle Scholar
Ash, R. B. (1972). Real Analysis and Probability. Academic Press, New York.Google Scholar
Bäuerle, N. and Rieder, U. (2010). Markov decision processes. Jahresber. Dtsch. Math.-Ver. 112, 217243.CrossRefGoogle Scholar
Bäuerle, N. and Rieder, U. (2011). Markov Decision Processes with Applications to Finance. Springer, Heidelberg.CrossRefGoogle Scholar
Billingsley, P. (1995). Probability and Measure, 3rd edn. John Wiley, New York.Google Scholar
Borkar, V. S. (1984). On minimum cost per unit of time control of Markov chains. SIAM J. Control Optimization 22, 965978.CrossRefGoogle Scholar
Borkar, V. S. (1991). Topics in Controlled Markov Chains. Longman Scientific and Technical, Harlow.Google Scholar
Cavazos-Cadena, R. (1988). Necessary and sufficient conditions for a bounded solution to the optimality equation in average reward Markov decision chains. Systems Control Lett. 10, 7178.CrossRefGoogle Scholar
Cavazos-Cadena, R. (1989). Necessary conditions for the optimality equation in average-reward Markov decision processes. Appl. Math. Optimization 19, 97112.CrossRefGoogle Scholar
Cavazos-Cadena, R. and Fernández-Gaucherand, E. (1995). Denumerable controlled Markov chains with average reward criterion: sample path optimality. Math. Meth. Operat. Res. 41, 89108.CrossRefGoogle Scholar
Cavazos-Cadena, R. and Hernández-Lerma, O. (1992). Equivalence of Lyapunov stability criteria in a class of Markov decision processes. Appl. Math. Optimization 26, 113137.CrossRefGoogle Scholar
Cavazos-Cadena, R. and Montes-de-Oca, R. (2012). Sample-path optimality in average Markov decision chains under a double Lyapunov function condition. In Optimization, Control, and Applications of Stochastic Systems. Springer, New York, pp. 3157.CrossRefGoogle Scholar
Cavazos-Cadena, R., Montes-de-Oca, R. and Sladký, K. (2014). A counterexample on sample-path optimality in stable Markov decision chains with the average reward criterion. J. Optimization Theory Appl. 163, 674684.CrossRefGoogle Scholar
Dai Pra, P., Di Masi, G. B. and Trivellato, B. (1999). Almost sure optimality and optimality in probability for stochastic control problems over an infinite time horizon. Ann. Operat. Res. 88, 161171.CrossRefGoogle Scholar
Foster, F. G. (1953). On the stochastic matrices associated with certain queueing processes. Ann. Math. Statist. 24, 355360.CrossRefGoogle Scholar
Hernández-Lerma, O. (1989). Adaptive Markov Control Processes. Springer, New York.CrossRefGoogle Scholar
Hernández-Lerma, O., Vega-Amaya, O. and Carrasco, G. (1999). Sample-path optimality and variance-minimization of average cost Markov control processes. SIAM J. Control Optimization 38, 7993.Google Scholar
Hordijk, A. (1974). Dynamic Programming and Markov Potential Theory (Math. Centre Tracts 51). Mathematisch Centrum, Amsterdam.Google Scholar
Hunt, F. Y. (2005). Sample path optimality for a Markov optimization problems. Stoch. Process. Appl. 115, 769779.CrossRefGoogle Scholar
Lasserre, J. B. (1999). Sample-path average optimality for Markov control processes. IEEE Trans. Automatic Control 44, 19661966.CrossRefGoogle Scholar
Montes-de-Oca, R. and Hernández-Lerma, O. (1996). Value iteration in average cost Markov control processes on Borel spaces. Acta Appl. Math. 42, 203222.CrossRefGoogle Scholar
Puterman, M. L. (1994). Markov Decision Processes. John Wiley, New York.Google Scholar
Sennott, L. I. (1999). Stochastic Dynamic Programming and the Control of Queueing Systems. John Wiley, New York.Google Scholar
Thomas, L. C. (1980). Connectedness conditions for denumerable state Markov decision processes. In Recent Developments in Markov Decision Processes. Academic Press, New York, pp. 181204.Google Scholar
Vega-Amaya, O. (1999). Sample path average optimality of Markov control processes with strictly unbounded cost. Appl. Math. (Warsaw) 26, 363381.CrossRefGoogle Scholar
Zhu, Q. and Guo, X. (2006). Another set of conditions for Markov decision processes with average sample-path costs. J. Math. Anal. Appl. 322, 11991214.CrossRefGoogle Scholar