Hostname: page-component-cd9895bd7-7cvxr Total loading time: 0 Render date: 2024-12-25T01:57:49.940Z Has data issue: false hasContentIssue false

Bias Optimality in Controlled Queueing Systems

Published online by Cambridge University Press:  14 July 2016

Moshe Haviv*
Affiliation:
The Hebrew University of Jerusalem
Martin L. Puterman*
Affiliation:
University of British Columbia
*
Postal address: Department of Econometrics, The University of Sydney, Sydney, NSW 2006, Australia and Department of Statistics, The Hebrew University of Jerusalem, Jerusalem 91905, Israel. e-mail address: [email protected]
∗∗Postal address: Faculty of Commerce and Business, University of British Columbia, 2053 Main Mall, Vancouver, BC, Canada V6T 1Z2. e-mail address: [email protected]

Abstract

This paper studies an admission control M/M/1 queueing system. It shows that the only gain (average) optimal stationary policies with gain and bias which satisfy the optimality equation are of control limit type, that there are at most two and, if there are two, they occur consecutively. Conditions are provided which ensure the existence of two gain optimal control limit policies and are illustrated with an example. The main result is that bias optimality distinguishes these two gain optimal policies and that the larger of the two control limits is the unique bias optimal stationary policy. Consequently it is also Blackwell optimal. This result is established by appealing to the third optimality equation of the Markov decision process and some observations concerning the structure of solutions of the second optimality equation.

Type
Research Papers
Copyright
Copyright © Applied Probability Trust 1998 

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Bertsekas, D. (1987). Dynamic Programming: Deterministic and Stochastic Models. Prentice-Hall, Englewood Cliffs, NJ.Google Scholar
Blackwell, D. (1962). Discrete dynamic programming. Ann. Math. Statist. 33, 719726.Google Scholar
Dekker, R., and Hordijk, A. (1988). Average, sensitive and Blackwell optimal policies in denumerable Markov decision chains with unbounded rewards. Math. Operat. Res. 13, 395420.CrossRefGoogle Scholar
Howard, R. (1960). Dynamic Programming and Markov Processes. MIT Press, Cambridge, MA.Google Scholar
Lasserre, J. (1988). Conditions for existence of average and Blackwell optimal stationary policies in denumerable Markov decision chains. J. Math. Anal. Appl. 136, 479490.Google Scholar
Lippman, S. (1975). Applying a new device in the optimization of exponential queueing systems. Operat. Res. 23, 687710.Google Scholar
Mahadevan, S. (1996). An average-reward reinforcement learning algorithm for computing bias-optimal policies. AAAI Proc.Google Scholar
Puterman, M.L. (1994). Markov Decision Processes. Wiley, New York.CrossRefGoogle Scholar
Puterman, M.L., and Thomas, L.C. (1987). A note on computing optimal control limits for GI/M/1 queuing systems. Management Sci. 33, 939943.Google Scholar
Spieksma, F. (1991). The existence of sensitive optimal policies in two multi-dimensional queueing models. Ann. Operat. Res. 28, 273295.Google Scholar
Stidham, S.S. Jr. (1978). Socially and individually optimal control of arrivals to a GI/M/1 queue. Management Sci. 24, 15981610.Google Scholar
Veinott, A.F. Jr. (1966). On finding optimal policies in discrete dynamic programming with no discounting. Ann. Math. Statist. 37, 12841294.Google Scholar
Veinott, A.F. Jr. (1974). Markov decision chains. In Studies in Optimization. ed. Dantzig, G.B. and Eaves, B.C. American Mathematical Association, Providence, RI. pp. 124159.Google Scholar