Published online by Cambridge University Press: 01 July 2016
The theory of allocation indices for defining the optimal policy in multi-armed bandit problems developed by Gittins is presented in the continuous-time case where the projects (or ‘arms’) are strong Markov processes. Complications peculiar to the continuous-time case are discussed. This motivates investigation of whether approximation of the continuous-time problems by discrete-time versions provides a valid technique with convergent allocation indices and optimal expected rewards. Conditions are presented under which the convergence holds.