Sample-Path Optimal Stationary Policies in Stable Markov Decision Chains with the Average Reward Criterion

Rolando Cavazos-Cadena; Raúl Montes-De-Oca; Karel Sladký

doi:10.1239/jap/1437658607

Sample-Path Optimal Stationary Policies in Stable Markov Decision Chains with the Average Reward Criterion

Part of: Markov processes Stochastic systems and control

Published online by Cambridge University Press: 30 January 2018

Rolando Cavazos-Cadena ,

Raúl Montes-De-Oca and

Karel Sladký

Show author details

Rolando Cavazos-Cadena*: Affiliation:
Universidad Autónoma Agraria Antonio Narro
Raúl Montes-De-Oca*: Affiliation:
Universidad Autónoma Metropolitana-Iztapalapa
Karel Sladký*: Affiliation:
Institute of Information Theory and Automation
*: ∗ Postal address: Departamento de Estadística y Cálculo, Universidad Autónoma Agraria Antonio Narro, Buenavista, Saltillo, COAH, 25315, México.Email address: [email protected]
∗∗ Postal address: Departamento de Matemáticas, Universidad Autónoma Metropolitana, Campus Iztapalapa, Avenida San Rafael Atlixco #186, Colonia Vicentina, México 09340, D. F. México.
∗∗∗ Postal address: Institute of Information Theory and Automation, Pod Vodárenskou věží 4, CZ-182 08, Praha 8, Czech Republic.

Article contents

Abstract
References

Rights & Permissions

Abstract

Core share and HTML view are not available for this content. However, as you have access to this content, a full PDF is available via the ‘Save PDF’ action button.

This paper concerns discrete-time Markov decision chains with denumerable state and compact action sets. Besides standard continuity requirements, the main assumption on the model is that it admits a Lyapunov function ℓ. In this context the average reward criterion is analyzed from the sample-path point of view. The main conclusion is that if the expected average reward associated to ℓ2 is finite under any policy then a stationary policy obtained from the optimality equation in the standard way is sample-path average optimal in a strong sense.

Keywords

Dominated convergence theorem for the expected average criterion discrepancy function Kolmogorov inequality innovations strong sample-path optimality

MSC classification

Primary: 90C40: Markov and semi-Markov decision processes

Secondary: 93E20: Optimal stochastic control 60J05: Discrete-time Markov processes on general state spaces

Type: Research Article
Information: Journal of Applied Probability , Volume 52 , Issue 2 , June 2015 , pp. 419 - 440

DOI: https://doi.org/10.1239/jap/1437658607 [Opens in a new window]
Copyright: © Applied Probability Trust

References

Arapostathis, A. et al. (1993). Discrete-time controlled Markov processes with average cost criterion: a survey. SIAM J. Control Optimization 31, 282–344.CrossRef Google Scholar

Ash, R. B. (1972). Real Analysis and Probability. Academic Press, New York.Google Scholar

Bäuerle, N. and Rieder, U. (2010). Markov decision processes. Jahresber. Dtsch. Math.-Ver. 112, 217–243.CrossRef Google Scholar

Bäuerle, N. and Rieder, U. (2011). Markov Decision Processes with Applications to Finance. Springer, Heidelberg.CrossRef Google Scholar

Billingsley, P. (1995). Probability and Measure, 3rd edn. John Wiley, New York.Google Scholar

Borkar, V. S. (1984). On minimum cost per unit of time control of Markov chains. SIAM J. Control Optimization 22, 965–978.CrossRef Google Scholar

Borkar, V. S. (1991). Topics in Controlled Markov Chains. Longman Scientific and Technical, Harlow.Google Scholar

Cavazos-Cadena, R. (1988). Necessary and sufficient conditions for a bounded solution to the optimality equation in average reward Markov decision chains. Systems Control Lett. 10, 71–78.CrossRef Google Scholar

Cavazos-Cadena, R. (1989). Necessary conditions for the optimality equation in average-reward Markov decision processes. Appl. Math. Optimization 19, 97–112.CrossRef Google Scholar

Cavazos-Cadena, R. and Fernández-Gaucherand, E. (1995). Denumerable controlled Markov chains with average reward criterion: sample path optimality. Math. Meth. Operat. Res. 41, 89–108.CrossRef Google Scholar

Cavazos-Cadena, R. and Hernández-Lerma, O. (1992). Equivalence of Lyapunov stability criteria in a class of Markov decision processes. Appl. Math. Optimization 26, 113–137.CrossRef Google Scholar

Cavazos-Cadena, R. and Montes-de-Oca, R. (2012). Sample-path optimality in average Markov decision chains under a double Lyapunov function condition. In Optimization, Control, and Applications of Stochastic Systems. Springer, New York, pp. 31–57.CrossRef Google Scholar

Cavazos-Cadena, R., Montes-de-Oca, R. and Sladký, K. (2014). A counterexample on sample-path optimality in stable Markov decision chains with the average reward criterion. J. Optimization Theory Appl. 163, 674–684.CrossRef Google Scholar

Dai Pra, P., Di Masi, G. B. and Trivellato, B. (1999). Almost sure optimality and optimality in probability for stochastic control problems over an infinite time horizon. Ann. Operat. Res. 88, 161–171.CrossRef Google Scholar

Foster, F. G. (1953). On the stochastic matrices associated with certain queueing processes. Ann. Math. Statist. 24, 355–360.CrossRef Google Scholar

Hernández-Lerma, O. (1989). Adaptive Markov Control Processes. Springer, New York.CrossRef Google Scholar

Hernández-Lerma, O., Vega-Amaya, O. and Carrasco, G. (1999). Sample-path optimality and variance-minimization of average cost Markov control processes. SIAM J. Control Optimization 38, 79–93.Google Scholar

Hordijk, A. (1974). Dynamic Programming and Markov Potential Theory (Math. Centre Tracts 51). Mathematisch Centrum, Amsterdam.Google Scholar

Hunt, F. Y. (2005). Sample path optimality for a Markov optimization problems. Stoch. Process. Appl. 115, 769–779.CrossRef Google Scholar

Lasserre, J. B. (1999). Sample-path average optimality for Markov control processes. IEEE Trans. Automatic Control 44, 1966–1966.CrossRef Google Scholar

Montes-de-Oca, R. and Hernández-Lerma, O. (1996). Value iteration in average cost Markov control processes on Borel spaces. Acta Appl. Math. 42, 203–222.CrossRef Google Scholar

Puterman, M. L. (1994). Markov Decision Processes. John Wiley, New York.Google Scholar

Sennott, L. I. (1999). Stochastic Dynamic Programming and the Control of Queueing Systems. John Wiley, New York.Google Scholar

Thomas, L. C. (1980). Connectedness conditions for denumerable state Markov decision processes. In Recent Developments in Markov Decision Processes. Academic Press, New York, pp. 181–204.Google Scholar

Vega-Amaya, O. (1999). Sample path average optimality of Markov control processes with strictly unbounded cost. Appl. Math. (Warsaw) 26, 363–381.CrossRef Google Scholar

Zhu, Q. and Guo, X. (2006). Another set of conditions for Markov decision processes with average sample-path costs. J. Math. Anal. Appl. 322, 1199–1214.CrossRef Google Scholar

Article contents

Sample-Path Optimal Stationary Policies in Stable Markov Decision Chains with the Average Reward Criterion

Abstract

Keywords

MSC classification

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests