Hostname: page-component-78c5997874-fbnjt Total loading time: 0 Render date: 2024-11-05T11:42:49.505Z Has data issue: false hasContentIssue false

Independently Expiring Multiarmed Bandits

Published online by Cambridge University Press:  27 July 2009

Rhonda Righter
Affiliation:
Department of Operations and Management Information Systems, Santa Clara University, Santa Clara, California 95053
J. George Shanthikumar
Affiliation:
Department of Industrial Engineering and Operations Research and Walter A. Haas School of BusinessUniversity of California, Berkeley, Berkeley, California 94720

Abstract

We give conditions on the optimality of an index policy for multiarmed bandits when arms expire independently. We also give a new simple proof of the optimality of the Gittins index policy for the classic multiarmed bandit problem.

Type
Research Article
Copyright
Copyright © Cambridge University Press 1998

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

1.Gittins, J.C. (1989). Multi-armed bandit allocation indices. New York: J. Wiley and Sons.Google Scholar
2.Gitlins, J.C. & Glazebrook, K.D. (1977). On Bayesian models in stochastic scheduling. Journal of Applied Probability 14: 556565.CrossRefGoogle Scholar
3.Gittins, J.C. & Jones, D.M. (1974). A dynamic allocation index for the sequential design of experiments. In Gani, J.et al. (eds.), Progress in statistics. Amsterdam: North Holland, pp. 241266.Google Scholar
4.Gittins, J.C. & Nash, P. (1977). Scheduling, queues and dynamic allocation indices. Proceedings of the 1974 European Meeting of Statisticians. Prague: Academy of Sciences, pp. 191202.Google Scholar
5.Glazebrook, K.D. (1976). Stochastic scheduling with order constraints. International Journal of Systems Science 7: 657666.CrossRefGoogle Scholar
6.Ishikida, T. & Wan, Y.-W. (1997). Scheduling jobs that are subject to deterministic due dates and have deteriorating expected rewards. Probability in the Engineering and Informational Sciences 11: 6578.CrossRefGoogle Scholar
7.Ross, S.M. (1983). Introduction to stochastic dynamic programming. New York: Academic Press.Google Scholar
8.Varaiya, P., Walrand, J., & Buyukkoc, C. (1985). Extensions of the multiarmed bandit problem: The discounted case. IEEE Transactions on Automatic Control AC-30: 426436.CrossRefGoogle Scholar
9.Weber, R.R. (1992). On the Gittins index for multiarmed bandits. Annals of Applied Probability 2: 10241033.CrossRefGoogle Scholar
10.Weber, R.R. & Weiss, G. (1990). On an index policy for restless bandits. Journal of Applied Probability 27: 637648.CrossRefGoogle Scholar
11.Weber, R.R. & Weiss, G. (1991). Addendum to ‘On an index policy for restless bandits’. Advances in Applied Probability 23: 429430.CrossRefGoogle Scholar
12.Weiss, G. (1988). Branding bandit processes. Probability in the Engineering and Informational Sciences 2: 269278.CrossRefGoogle Scholar
13.Whittle, P. (1980). Multiarmed bandits and the Gittins index. Journal of the Royal Statistical Society Series B 42: 143149.Google Scholar
14.Whittle, P. (1981). Arm acquiring bandits. Annals of Probability 9: 284292.CrossRefGoogle Scholar
15.Whittle, P. (1988). Restless bandits: Activity allocation in a changing world. In Gani, J. (ed.), Celebration of applied probability. Journal of Applied Probability 25A: 287298.CrossRefGoogle Scholar