Hostname: page-component-cd9895bd7-lnqnp Total loading time: 0 Render date: 2024-12-23T05:58:08.466Z Has data issue: false hasContentIssue false

A conjecture on the Feldman bandit problem

Published online by Cambridge University Press:  28 March 2018

Maher Nouiehed*
Affiliation:
University of Southern California
Sheldon M. Ross*
Affiliation:
University of Southern California
*
* Postal address: Department of Industrial and Systems Engineering, University of Southern California, Los Angeles, CA 90089, USA.
* Postal address: Department of Industrial and Systems Engineering, University of Southern California, Los Angeles, CA 90089, USA.

Abstract

We consider the Bernoulli bandit problem where one of the arms has win probability α and the others β, with the identity of the α arm specified by initial probabilities. With u = max(α, β), v = min(α, β), call an arm with win probability u a good arm. Whereas it is known that the strategy of always playing the arm with the largest probability of being a good arm maximizes the expected number of wins in the first n games for all n, we conjecture that it also stochastically maximizes the number of wins. That is, we conjecture that this strategy maximizes the probability of at least k wins in the first n games for all k, n. The conjecture is proven when k = 1, and k = n, and when there are only two arms and k = n - 1.

Type
Short Communications
Copyright
Copyright © Applied Probability Trust 2018 

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

[1]Feldman, D. (1962). Contributions to the "two-armed bandit" problem. Ann. Math. Statist. 33, 847856. Google Scholar
[2]Rodman, L. (1978). On the many-armed bandit problem. Ann. Prob. 6, 491498. CrossRefGoogle Scholar
[3]Presman, É. L. and Sonin, I. N. (1990). Sequential Control With Incomplete Information: The Bayesian Approach to Multi-Armed Bandit Problems. Academic Press, San Diego, CA. Google Scholar