Detecting optimal and non-optimal actions in average-cost Markov decision processes

Jean B. Lasserre

doi:10.2307/3215322

Detecting optimal and non-optimal actions in average-cost Markov decision processes

Part of: Markov processes Stochastic systems and control

Published online by Cambridge University Press: 14 July 2016

Jean B. Lasserre

Show author details

Jean B. Lasserre*: Affiliation:
LAAS-CNRS
*: ∗Postal address: LAAS-CNRS, 7 Av. Colonel Roche, 31 077 Toulouse Cedex, France.

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

We present two sufficient conditions for detection of optimal and non-optimal actions in (ergodic) average-cost MDPs. They are easily interpreted and can be implemented as detection tests in both policy iteration and linear programming methods. An efficient implementation of a recent new policy iteration scheme is discussed.

Keywords

POLICY ITERATION LINEAR PROGRAMMING ELIMINATION OF NON-OPTIMAL ACTIONS

MSC classification

Primary: 60J10: Markov chains (discrete-time Markov processes on discrete state spaces)

Secondary: 93E20: Optimal stochastic control 90C40: Markov and semi-Markov decision processes

Type: Research Article
Information: Journal of Applied Probability , Volume 31 , Issue 4 , December 1994 , pp. 979 - 990

DOI: https://doi.org/10.2307/3215322 [Opens in a new window]
Copyright: Copyright © Applied Probability Trust 1994

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable

References

[1] Bean, J., Hopp, W. and Duenyas, I. (1993) A stopping rule for forecast horizons in nonhomogeneous Markov decision processes. Operat. Res. 40, 1188–1199.Google Scholar

[2] Bes, C. and Lasserre, J. B. (1986) An on-line procedure in discounted infinite horizon stochastic optimal control. J. Optim. Theory Appl. 50, 61–67.Google Scholar

[3] Cheng, M. C. (1980) New criteria for the simplex algorithm. Math. Progr. 19, 230–236.Google Scholar

[4] Denardo, E. V. and Fox, B. (1968) Multichain renewal programs. SIAM J. Appl. Math. 16, 468–487.Google Scholar

[5] Hastings, N. A. J. (1976) A test for non-optimal actions in undiscounted finite Markov decision chains. Management Sci. 23, 87–92.CrossRef Google Scholar

[6] Howard, R. (1960) Dynamic Programming and Markov Processes. MIT Press, Cambridge, MA.Google Scholar

[7] Kallenberg, L. M. C. (1983) Linear Programming and Finite Markovian Control Problems. Mathematical Centre Tracts, Mathematisch Centrum. Amsterdam.Google Scholar

[8] Lasserre, J. B. (1994) A new policy iteration scheme for Markov decision processes using Schweitzer's formula. J. Appl. Prob. 31, 268–273.Google Scholar

[9] Puterman, M. (1990) Markov decision processes. In Handbook in Operations Research and Management Science 2: Stochastic Models , ed. Heyman, D. P. and Sobel, M., North-Holland, Amsterdam.Google Scholar

[10] Van Der Wal, J. (1976) A successive approximation algorithm. Computing 17, 157–162.Google Scholar

Article contents

Detecting optimal and non-optimal actions in average-cost Markov decision processes

Abstract

Keywords

MSC classification

Access options

Article purchase

Temporarily unavailable

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests