Hostname: page-component-78c5997874-g7gxr Total loading time: 0 Render date: 2024-11-19T06:13:19.817Z Has data issue: false hasContentIssue false

Semi-Markov decision processes with polynomial reward

Published online by Cambridge University Press:  14 July 2016

Zvi Rosberg*
Affiliation:
Technion — Israel Institute of Technology
*
Postal address: Department of Computer Science, Technion — Israel Institute of Technology, Haifa 32000, Israel.

Abstract

A semi-Markov decision process, with a denumerable multidimensional state space, is considered. At any given state only a finite number of actions can be taken to control the process. The immediate reward earned in one transition period is merely assumed to be bounded by a polynomial and a bound is imposed on a weighted moment of the next state reached in one transition. It is shown that under an ergodicity assumption there is a stationary optimal policy for the long-run average reward criterion. A queueing network scheduling problem, for which previous criteria are inapplicable, is given as an application.

Type
Research Papers
Copyright
Copyright © Applied Probability Trust 1982 

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Footnotes

This study was initiated in the course of the author's visit to the Department of Business Administration of the University of Illinois, and concluded with the support of the National Science Foundation under Grant ENG-7903879A01 in the course of his visit to Department EECS of the University of California at Berkeley.

References

[1] Federgruen, A., Hordijk, A. and Tijms, H. C. (1979) Denumerable state semi-Markov decision processes with unbounded costs, average cost criterion. Stoch. Proc. Appl. 9, 223235.Google Scholar
[2] Fisher, L. (1968) On recurrent denumerable decision processes. Ann. Math. Statist. 39, 424434.Google Scholar
[3] Lippman, S. A. (1973) Semi-Markov decision processes with unbounded reward. Management Sci. 19, 717731.Google Scholar
[4] Lippman, S. A. (1975) On dynamic programming with unbounded rewards. Management Sci. 21, 12251233.Google Scholar
[5] Nenen, J. V. and Wessels, J. (1979) Successive approximations for Markov decision processes and Markov games with unbounded rewards. Math. Operationsforsch. Statist. Ser. Optimization 19, 431455.CrossRefGoogle Scholar
[6] Rosberg, Z. (1980) A positive recurrence criterion associated with multi-dimensional queueing networks. J. Appl. Prob. 17, 790801.CrossRefGoogle Scholar
[7] Rosberg, Z., Varaiya, P. and Walrand, J. (1982) Optimal control of service in tandem queues. IEEE Trans. Automatic Control AC-27.Google Scholar
[8] Ross, S. M. (1970) Average cost semi-Markov decision processes. J. Appl. Prob. 7, 649659.Google Scholar