Open bandit processes and optimal scheduling of queueing networks

Tze Leung Lai; Zhiliang Ying

doi:10.2307/1427399

Open bandit processes and optimal scheduling of queueing networks

Published online by Cambridge University Press: 01 July 2016

Tze Leung Lai and

Zhiliang Ying

Show author details

Tze Leung Lai*: Affiliation:
Stanford University
Zhiliang Ying*: Affiliation:
Columbia University
*: ∗Postal address: Department of Statistics, Stanford University, Stanford, CA 94305, USA.
∗∗ Postal address: Department of Statistics, Box 10 Mathematics, Columbia University, New York, NY 10027, USA.

Article contents

Abstract
Footnotes
References

Get access

Rights & Permissions

Abstract

Asymptotic approximations are developed herein for the optimal policies in discounted multi-armed bandit problems in which new projects are continually appearing, commonly known as ‘open bandit problems’ or ‘arm-acquiring bandits’. It is shown that under certain stability assumptions the open bandit problem is asymptotically equivalent to a closed bandit problem in which there is no arrival of new projects, as the discount factor approaches 1. Applications of these results to optimal scheduling of queueing networks are given. In particular, Klimov&s priority indices for scheduling queueing networks are shown to be limits of the Gittins indices for the associated closed bandit problem, and extensions of Klimov&s results to preemptive policies and to unstable queueing systems are given.

Keywords

OPEN AND CLOSED BANDIT PROBLEMS DYNAMIC PROGRAMMING OPTIMAL STOPPING PREEMPTIVE AND NON-PREEMPTIVE POLICIES PRIORITY INDICES

Type: Research Article
Information: Advances in Applied Probability , Volume 20 , Issue 2 , June 1988 , pp. 447 - 472

DOI: https://doi.org/10.2307/1427399 [Opens in a new window]
Copyright: Copyright © Applied Probability Trust 1988

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable

Footnotes

Research supported by the National Science Foundation and the Army Research Office.

References

[1] Cox, D. R. and Smith, W. L. (1961) Queues, Methuen, London.Google Scholar

[2] Derman, C. (1970) Finite State Markovian Decision Processes. Academic Press, New York.Google Scholar

[3] Gittins, J. C. (1979) Bandit processes and dynamic allocation indices. J. R. Statist. Soc. B 41, 148–177.Google Scholar

[4] Gittins, J. C. and Glazebrook, K. D. (1977) On Bayesian models in stochastic scheduling. J. Appl. Prob. 14, 556–565.Google Scholar

[5] Gittins, J. C. and Jones, D. M. (1972) A dynamic allocation index for the sequential design of experiments. Paper read at the European Meeting of Statisticians, Budapest. In Progress in Statistics (ed. Gani, J. et al., North-Holland, Amsterdam, 1974) 241–266.Google Scholar

[6] Klimov, G. P. (1974) Time-sharing service systems I. Theory Prob. Appl. 19, 532–551.CrossRef Google Scholar

[7] Klimov, G. P. (1978) Time-sharing service systems II. Theory Prob. Appl. 23, 314–321.Google Scholar

[8] Mandelbaum, A. (1986) Discrete multi-armed bandits and multi-parameter processes. Prob. Theory Rel. Fields 71, 129–147.Google Scholar

[9] Nash, P. (1973) Optimal Allocation of Resources between Research Projects. Ph.D. Thesis, Cambridge University.Google Scholar

[10] Rao, C. R. (1973) Linear Statistical Inference and Its Applications. Wiley, New York.Google Scholar

[11] Tcha, D. and Pliska, S. R. (1977) Optimal control of single-server queueing networks and multi-class M/G/1 queues with feedback. Operat. Res. 25, 248–258.Google Scholar

[12] Varaiya, P., Walrand, J. C. and Buyukkoc, C. (1985) Extensions of the multiarmed bandit problem: the discounted case. IEEE Trans. Autom. Contr 30, 426–439.Google Scholar

[13] Whittle, P. (1980) Multi-armed bandits and the Gittins index. J. R. Statist. Soc. B 42, 143–149.Google Scholar

[14] Whittle, P. (1981) Arm-acquiring bandits. Ann. Prob. 9, 284–292.Google Scholar

[15] Whittle, P. (1982) Optimization Over Time: Dynamic Programming and Stochastic Control, Vol. 1, Wiley, New York.Google Scholar

Article contents

Open bandit processes and optimal scheduling of queueing networks

Abstract

Keywords

Access options

Article purchase

Temporarily unavailable

Footnotes

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests