Hostname: page-component-78c5997874-s2hrs Total loading time: 0 Render date: 2024-11-05T16:30:19.458Z Has data issue: false hasContentIssue false

Detecting optimal and non-optimal actions in average-cost Markov decision processes

Published online by Cambridge University Press:  14 July 2016

Jean B. Lasserre*
Affiliation:
LAAS-CNRS
*
Postal address: LAAS-CNRS, 7 Av. Colonel Roche, 31 077 Toulouse Cedex, France.

Abstract

We present two sufficient conditions for detection of optimal and non-optimal actions in (ergodic) average-cost MDPs. They are easily interpreted and can be implemented as detection tests in both policy iteration and linear programming methods. An efficient implementation of a recent new policy iteration scheme is discussed.

Type
Research Article
Copyright
Copyright © Applied Probability Trust 1994 

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

[1] Bean, J., Hopp, W. and Duenyas, I. (1993) A stopping rule for forecast horizons in nonhomogeneous Markov decision processes. Operat. Res. 40, 11881199.Google Scholar
[2] Bes, C. and Lasserre, J. B. (1986) An on-line procedure in discounted infinite horizon stochastic optimal control. J. Optim. Theory Appl. 50, 6167.Google Scholar
[3] Cheng, M. C. (1980) New criteria for the simplex algorithm. Math. Progr. 19, 230236.Google Scholar
[4] Denardo, E. V. and Fox, B. (1968) Multichain renewal programs. SIAM J. Appl. Math. 16, 468487.Google Scholar
[5] Hastings, N. A. J. (1976) A test for non-optimal actions in undiscounted finite Markov decision chains. Management Sci. 23, 8792.CrossRefGoogle Scholar
[6] Howard, R. (1960) Dynamic Programming and Markov Processes. MIT Press, Cambridge, MA.Google Scholar
[7] Kallenberg, L. M. C. (1983) Linear Programming and Finite Markovian Control Problems. Mathematical Centre Tracts, Mathematisch Centrum. Amsterdam.Google Scholar
[8] Lasserre, J. B. (1994) A new policy iteration scheme for Markov decision processes using Schweitzer's formula. J. Appl. Prob. 31, 268273.Google Scholar
[9] Puterman, M. (1990) Markov decision processes. In Handbook in Operations Research and Management Science 2: Stochastic Models , ed. Heyman, D. P. and Sobel, M., North-Holland, Amsterdam.Google Scholar
[10] Van Der Wal, J. (1976) A successive approximation algorithm. Computing 17, 157162.Google Scholar