Hostname: page-component-cd9895bd7-jn8rn Total loading time: 0 Render date: 2024-12-24T00:59:16.830Z Has data issue: false hasContentIssue false

Continuous-time allocation indices and their discrete-time approximation

Published online by Cambridge University Press:  01 July 2016

W. J. R. Eplett*
Affiliation:
University of Oxford
*
Present address: Department of Statistics, University of Rochester, River Campus, Rochester, NY 14627, USA.

Abstract

The theory of allocation indices for defining the optimal policy in multi-armed bandit problems developed by Gittins is presented in the continuous-time case where the projects (or ‘arms’) are strong Markov processes. Complications peculiar to the continuous-time case are discussed. This motivates investigation of whether approximation of the continuous-time problems by discrete-time versions provides a valid technique with convergent allocation indices and optimal expected rewards. Conditions are presented under which the convergence holds.

Type
Research Article
Copyright
Copyright © Applied Probability Trust 1986 

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Bensoussan, A. and Lions, J. L. (1978) Applications des Inéquations variationnelles en Contrôle stochastique. Dunod, Paris.Google Scholar
El Karoui, N. (1981) Les aspects probabilistes du contrôle stochastique. In Springer Lecture Notes in Mathematics 876, 74238.Google Scholar
Gittins, J. C. (1979) Bandit processes and dynamic allocation indices (with discussion). J. R. Statist. Soc. B 41, 148177.Google Scholar
Gittins, J. C. and Jones, D. M. (1974) A dynamic allocation index for the sequential design of experiments. In Progress in Statistics ed. Gani, J., North Holland, Amsterdam, 241266.Google Scholar
Ikeda, N. and Watanabe, S. (1981) Stochastic Differential Equations and Diffusion Processes. North-Holland, Oxford.Google Scholar
Karatzas, I. (1984) Gittins indices in the dynamic allocation problem for diffusion processes. Ann. Prob. 12, 173192.Google Scholar
Nash, P. and Gittins, J. C. (1977) A Hamiltonian approach to optimal stochastic resource allocation. Adv. Appl. Prob. 9, 5568.Google Scholar
Perkins, E. (1981) A global intrinsic characterization of Brownian local time. Ann. Prob. 9, 800817.Google Scholar
Reuter, G. E. H. (1976) Denumerable Markov processes (IV): on C. T. Hou&s uniqueness theorem for Q-semigroups. Z. Wahrscheinlichkeitsth. 33, 309315.Google Scholar
Ross, S. M. (1970) Applied Probability Models with Optimization Applications. Holden-Day, San Francisco.Google Scholar
Stroock, D. W. and Varadhan, S. R. S. (1979) Multidimensional Diffusion Processes. Springer-Verlag, Berlin.Google Scholar
Whittle, P. (1980) Multi-armed bandits and the Gittins index. J. R. Statist. Soc. B 42, 143149.Google Scholar
Williams, D. (1979) Diffusions, Markov Processes, and Martingales , Vol. I. Wiley, New York.Google Scholar