Hostname: page-component-586b7cd67f-g8jcs Total loading time: 0 Render date: 2024-11-28T22:07:50.355Z Has data issue: false hasContentIssue false

The optimality equation in average cost denumerable state semi-Markov decision problems, recurrency conditions and algorithms

Published online by Cambridge University Press:  14 July 2016

A. Federgruen
Affiliation:
Mathematisch Centrum, Amsterdam
H. C. Tijms
Affiliation:
Vrije Universiteit, Amsterdam

Abstract

This paper is concerned with the optimality equation for the average costs in a denumerable state semi-Markov decision model. It will be shown that under each of a number of recurrency conditions on the transition probability matrices associated with the stationary policies, the optimality equation has a bounded solution. This solution indeed yields a stationary policy which is optimal for a strong version of the average cost optimality criterion. Besides the existence of a bounded solution to the optimality equation, we will show that both the value-iteration method and the policy-iteration method can be used to determine such a solution. For the latter method we will prove that the average costs and the relative cost functions of the policies generated converge to a solution of the optimality equation.

Type
Research Papers
Copyright
Copyright © Applied Probability Trust 1978 

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

[1] Anthonisse, J. M. and Tijms, H. C. (1977) Exponential convergence of products of stochastic matrices. J. Math. Anal. Appl. 59, 360364.Google Scholar
[2] De Leve, G., Federgruen, A. and Tijms, H. C. (1976) A general Markov decision method, I: model and method. Adv. Appl. Prob. 9, 296315.Google Scholar
[3] De Leve, G., Federgruen, A. and Tijms, H. C. (1977) Generalized Markovian Decision Processes, Revisited. Mathematical Centre Tract, Mathematisch Centrum, Amsterdam. To appear.Google Scholar
[4] Derman, C. (1966) Denumerable state Markovian decision processes-average cost criterion. Ann. Math. Statist. 37, 15451553.Google Scholar
[5] Derman, C. and Veinott, A. Jr. (1967) A solution to a countable system of equations arising in Markovian decision processes. Ann. Math. Statist. 38, 582584.Google Scholar
[6] Doob, J. L. (1953) Stochastic Processes. Wiley, New York.Google Scholar
[7] Federgruen, A., Schweitzer, P. J. and Tijms, H. C. (1977) Contraction mappings underlying undiscounted Markov decision problems. J. Math. Anal. Appl. To appear.Google Scholar
[8] Flynn, J. (1977) Conditions for the equivalence of optimality criteria in dynamic programming. Ann. Statist. 41, 936953.Google Scholar
[9] Hajnal, J. (1958) Weak ergodicity in non homogeneous Markov chains. Proc. Camb. Phil. Soc. 54, 233246.Google Scholar
[10] Hastings, N. A. J. (1971) Bounds on the gain of Markov decision processes. Opns Res. 10, 240243.Google Scholar
[11] Hordijk, A. (1974) Dynamic Programming and Potential Theory. Mathematical Centre Tract No. 51, Mathematisch Centrum, Amsterdam.Google Scholar
[12] Hordijk, A., Schweitzer, P. J. and Tijms, H. C. (1975) The asymptotic behaviour of the minimal total expected cost for the denumerable state Markov decision model. J. Appl. Prob. 12, 298305.Google Scholar
[13] Hordijk, A. and Sladky, K. (1975) Sensitive optimality criteria in countable state dynamic programming. Maths Opns Res. 2, 113.Google Scholar
[14] Lippman, S. A. (1975) On dynamic programming with unbounded rewards. Management Sci. 21, 12251233.Google Scholar
[15] Maitra, A. (1968) Discounted dynamic programming on compact metric spaces. Sankhya A 30, 211216.Google Scholar
[16] Robinson, D. R. (1976) Markov decision chains with unbounded costs and applications to the control of queues. Adv. Appl. Prob. 8, 159176.Google Scholar
[17] Ross, S. M. (1970) Applied Probability Models with Optimization Applications. Holden-Day, San Francisco.Google Scholar
[18] Royden, H. L. (1968) Real Analysis, 2nd edn., MacMillan, New York.Google Scholar
[19] Schweitzer, P. J. (1971) Iterative solution of the functional equations of undiscounted Markov renewal programming. J. Math. Anal. Appl. 34, 495501.Google Scholar
[20] Tijms, H. C. (1975) On dynamic programming with arbitrary state space, compact action space and the average return as criterion. Report BW 55/75, Mathematisch Centrum, Amsterdam.Google Scholar