Hostname: page-component-586b7cd67f-r5fsc Total loading time: 0 Render date: 2024-11-23T07:29:50.517Z Has data issue: false hasContentIssue false

Two-armed bandits with a goal, II. Dependent arms

Published online by Cambridge University Press:  01 July 2016

Donald A. Berry*
Affiliation:
University of Minnesota
Bert Fristedt*
Affiliation:
University of Minnesota
*
Postal address: ∗ School of Statistics, University of Minnesota, 270 Vincent Hall, 206 Church St S.E., Minneapolis, MN 55455, U.S.A.
Postal address: ∗∗ School of Mathematics, University of Minnesota, 270 Vincent Hall, 206 Church St S.E., Minneapolis, MN 55455, U.S.A.

Abstract

One of two random variables, X and Y, can be selected at each of a possibly infinite number of stages. Depending on the outcome, one's fortune is either increased or decreased by 1. The probability of increase may not be known for either X or Y. The objective is to increase one's fortune to G before it decreases to g, for some integral g and G; either may be infinite.

In Part I (Berry and Fristedt (1980)), the distribution of X is unknown and that of Y is known. In the current part, it is known that either X or Y has probability α of increasing the current fortune by 1 and the other has probability β of increasing the fortune by 1, where α and β are known, but which goes with X is not known. We show that optimal strategies exist in general and find all optimal schemes when α = 0 and when α + β = 1. In both cases myopic strategies are shown to be optimal. A counterexample is used to show that myopic strategies, while intuitively very appealing, are not optimal for general (α, β).

Type
Research Article
Copyright
Copyright © Applied Probability Trust 1980 

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Footnotes

This author's research sponsored by the NSF under Grant No. MCS 78-02694.

∗∗

This author's research sponsored by the NSF under Grant No. MCS 78-01168 A01.

References

Berry, D. A. (1972) A Bernoulli two-armed bandit. Ann. Math. Statist. 43, 871897.CrossRefGoogle Scholar
Berry, D. A. and Fristedt, B. (1980) Two-armed bandits with a goal, I. One arm known. Adv. Appl. Prob. 12, 775798.Google Scholar
Degroot, M. H. (1970) Optimal Statistical Decisions. McGraw-Hill, New York.Google Scholar
Fabius, J. and Van Zwet, W. R. (1970) Some remarks on the two-armed bandit. Ann. Math. Statist. 41, 19061916.Google Scholar
Feldman, D. (1962) Contributions to the ‘two-armed bandit’ problem. Ann. Math. Statist. 33, 847856.CrossRefGoogle Scholar
Kelly, T. A. (1974) A note on the Bernoulli two-armed bandit. Ann. Statist. 2, 10561062.Google Scholar