Hostname: page-component-745bb68f8f-s22k5 Total loading time: 0 Render date: 2025-01-10T23:24:40.687Z Has data issue: false hasContentIssue false

Policies without Memory for the Infinite-Armed Bernoulli Bandit under the Average-Reward Criterion

Published online by Cambridge University Press:  27 July 2009

Stephen J. Herschkorn
Affiliation:
School of Business and RUTCOR, Rutgers University, New Brunswick, New Jersey 08903
Erol Peköz
Affiliation:
Department of industrial Engineering and Operations Research, University of California, Berkeley, California 94720
Sheldon M. Ross
Affiliation:
Department of industrial Engineering and Operations Research, University of California, Berkeley, California 94720

Abstract

We consider a bandit problem with infinitely many Bernoulli arms whose unknown parameters are i.i.d. We present two policies that maximize the almost sure average reward over an infinite horizon. Neither policy ever returns to a previously observed arm after switching to a new one or retains information from discarded arms, and runs of failures indicate the selection of a new arm. The first policy is nonstationary and requires no information about the distribution of the Bernoulli parameter. The second is stationary and requires only partial information; its optimality is established via renewal theory. We also develop ε-optimal stationary policies that require no information about the distribution of the unknown parameter and discuss universally optimal stationary policies.

Type
Research Article
Copyright
Copyright © Cambridge University Press 1996

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

1.Berry, D., Chen, R., Heath, D., Shepp, L., & Zame, A. (in preparation). A bandit problem with infinitely many arms.Google Scholar
2.Mallows, C.L. & Robbins, H. (1964). Some problems of optimal sampling strategy. Journal of Mathematical Analysis and Applications 8: 90103.CrossRefGoogle Scholar
3.Ross, S.M. (1983). Stochastic processes. New York: John Wiley.Google Scholar
4.Yakowitz, S. & Lowe, W. (1991). Nonparametric bandit methods. Annals of Operations Research 28: 297312.Google Scholar