Article contents
Two-armed bandits with a goal, II. Dependent arms
Published online by Cambridge University Press: 01 July 2016
Abstract
One of two random variables, X and Y, can be selected at each of a possibly infinite number of stages. Depending on the outcome, one's fortune is either increased or decreased by 1. The probability of increase may not be known for either X or Y. The objective is to increase one's fortune to G before it decreases to g, for some integral g and G; either may be infinite.
In Part I (Berry and Fristedt (1980)), the distribution of X is unknown and that of Y is known. In the current part, it is known that either X or Y has probability α of increasing the current fortune by 1 and the other has probability β of increasing the fortune by 1, where α and β are known, but which goes with X is not known. We show that optimal strategies exist in general and find all optimal schemes when α = 0 and when α + β = 1. In both cases myopic strategies are shown to be optimal. A counterexample is used to show that myopic strategies, while intuitively very appealing, are not optimal for general (α, β).
Keywords
- Type
- Research Article
- Information
- Copyright
- Copyright © Applied Probability Trust 1980
Footnotes
This author's research sponsored by the NSF under Grant No. MCS 78-02694.
This author's research sponsored by the NSF under Grant No. MCS 78-01168 A01.
References
- 7
- Cited by