On transforming an index for generalised bandit problems

K. D. Glazebrook; S. Greatrix

doi:10.2307/3214927

On transforming an index for generalised bandit problems

Published online by Cambridge University Press: 14 July 2016

K. D. Glazebrook and

S. Greatrix

Show author details

K. D. Glazebrook*: Affiliation:
University of Newcastle upon Tyne
S. Greatrix*: Affiliation:
University of Newcastle upon Tyne
*: ∗Postal address: Department of Mathematics and Statistics, University of Newcastle upon Tyne, NE1 7RU, UK.
∗Postal address: Department of Mathematics and Statistics, University of Newcastle upon Tyne, NE1 7RU, UK.

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

Nash (1980) demonstrated that index policies are optimal for a class of generalised bandit problem. A transform of the index concerned has many of the attributes of the Gittins index. The transformed index is positive-valued, with maximal values yielding optimal actions. It may be characterised as the value of a restart problem and is hence computable via dynamic programming methodologies. The transformed index can also be used in procedures for policy evaluation.

Keywords

GITTINS INDEX MARKOV DECISION PROCESS POLICY EVALUATION RESTART PROBLEM VALUE ITERATION

MSC classification

Primary: 90C40: Markov and semi-Markov decision processes

Type: Research Papers
Information: Journal of Applied Probability , Volume 32 , Issue 1 , March 1995 , pp. 168 - 182

DOI: https://doi.org/10.2307/3214927 [Opens in a new window]
Copyright: Copyright © Applied Probability Trust 1995

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable

References

Fay, N. A. and Glazebrook, K. D. (1987) On the scheduling of alternative stochastic jobs on a single machine. Adv. Appl. Prob. 19, 955–973.Google Scholar

Fay, N. A. and Glazebrook, K. D. (1989) A general model for the scheduling of alternative tasks on a single machine. Proc. Eng. Inf. Sci. 3, 199–221.CrossRef Google Scholar

Fay, N. A. and Walrand, J. C. (1991) On approximately optimal index strategies for generalised arm problems. J. Appl. Prob. 28, 602–612.CrossRef Google Scholar

Gittins, J. C. (1979) Bandit processes and dynamic allocation indices (with discussion). J. R. Statist. Soc. B41, 148–177.Google Scholar

Gittins, J. C. (1989) Multi-armed Bandit Allocation Indices. Wiley, Chichester.Google Scholar

Gittins, J. C. and Jones, D. M. (1974) A dynamic allocation index for the sequential design of experiments. In Progress in Statistics, ed. Gani, J. and Vince, I., 1, pp. 241–266. North-Holland, Amsterdam.Google Scholar

Glazebrook, K. D. (1982) On the evaluation of suboptimal strategies for families of alternative bandit processes. J. Appl. Prob. 19, 716–722.CrossRef Google Scholar

Glazebrook, K. D. (1983) Optimal strategies for families of alternative bandit processes. IEEE Trans. Autom Control 28, 858–861.CrossRef Google Scholar

Glazebrook, K. D. (1990) Procedures for the evaluation of strategies for resource allocation in a stochastic environment. J. Appl. Prob. 27, 215–220.CrossRef Google Scholar

Glazebrook, K. D. (1993) Indices for families of competing Markov decision processes with influence. Ann. Appl. Prob. 3, 1013–1032.CrossRef Google Scholar

Glazebrook, K. D. and Fay, N. A. (1988) Evaluating strategies for generalised bandit problems. Int. J. Syst. Sci. 19, 1605–1613.CrossRef Google Scholar

Glazebrook, K. D. and Greatrix, S. (1993) On scheduling influential stochastic tasks on a single machine. Eur. J. Operat. Res. 70, 405–424.CrossRef Google Scholar

Glazebrook, K. D. and Owen, R. W. (1991) New results for generalised bandit processes. Int. J. Syst. Sci. 22, 479–494.CrossRef Google Scholar

Katehakis, M. N. and Veinott, A. F. (1987) The multi-armed bandit problem: decomposition and computation. Math. Operat. Res. 12, 262–268.CrossRef Google Scholar

Klimov, G. P. (1974) Time-sharing service systems I. Theory Prob. Appl. 19, 535–551.Google Scholar

Nash, P. (1980) A generalised bandit problem. J. R. Statist. Soc. B42, 165–169.Google Scholar

Robinson, D. R. (1982) Algorithms for evaluating the dynamic allocation index. Operat. Res. Lett. 1, 72–74.CrossRef Google Scholar

Veinott, A. F. (1969) Discrete dynamic programming with sensitive discount optimality criteria. Ann. Math. Statist. 40, 1635–1660.CrossRef Google Scholar

Article contents

On transforming an index for generalised bandit problems

Abstract

Keywords

MSC classification

Access options

Article purchase

Temporarily unavailable

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests