Hostname: page-component-cd9895bd7-p9bg8 Total loading time: 0 Render date: 2024-12-23T18:09:10.939Z Has data issue: false hasContentIssue false

Restless bandits, partial conservation laws and indexability

Published online by Cambridge University Press:  01 July 2016

José Niño-Mora*
Affiliation:
Universitat Pompeu Fabra, Barcelona
*
Postal address: Department of Economics and Business, Universitat Pompeu Fabra, 08005 Barcelona, Spain. Email address: [email protected]

Abstract

We show that if performance measures in a general stochastic scheduling problem satisfy partial conservation laws (PCL), which extend the generalized conservation laws (GCL) introduced by Bertsimas and Niño-Mora (1996), then the problem is solved optimally by a priority-index policy under a range of admissible linear performance objectives, with both this range and the optimal indices being determined by a one-pass adaptive-greedy algorithm that extends Klimov's: we call such scheduling problems PCL-indexable. We further apply the PCL framework to investigate the indexability property of restless bandits (two-action finite-state Markov decision chains) introduced by Whittle, obtaining the following results: (i) we present conditions on model parameters under which a single restless bandit is PCL-indexable, and hence indexable; membership of the class of PCL-indexable bandits is tested through a single run of the adaptive-greedy algorithm, which further computes the Whittle indices when the test is positive; this provides a tractable sufficient condition for indexability; (ii) we further introduce the subclass of GCL-indexable bandits (including classical bandits), which are indexable under arbitrary linear rewards. Our analysis is based on the achievable region approach to stochastic optimization, as the results follow from deriving and exploiting a new linear programming reformulation for single restless bandits.

Type
General Applied Probability
Copyright
Copyright © Applied Probability Trust 2001 

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

[1] Bertsimas, D. and Niño-Mora, J. (1996). Conservation laws, extended polymatroids and multiarmed bandit problems; a unified approach to indexable systems. Math. Operat. Res. 21, 257306.CrossRefGoogle Scholar
[2] Bertsimas, D. and Niño-Mora, J. (1999). Optimization of multiclass queueing networks with changeover times via the achievable region method: Part I, the single-station case. Math. Operat. Res. 24, 306330.CrossRefGoogle Scholar
[3] Bertsimas, D. and Niño-Mora, J. (2000). Restless bandits, linear programming relaxations, and a primal-dual index heuristic. Operat. Res. 48, 8090.Google Scholar
[4] Chen, Y. R. and Katehakis, M. N. (1986). Linear programming for finite state multi-armed bandit problems. Math. Operat. Res. 11, 180183.CrossRefGoogle Scholar
[5] Dacre, M., Glazebrook, K. D. and Niño-Mora, J. (1999). The achievable region approach to the optimal control of stochastic systems (with discussion). J. R. Statist. Soc. B 61, 747791.CrossRefGoogle Scholar
[6] D'Epénoux, F., (1960). Sur un problème de production et de stockage dans l'aléatoire. RAIRO Rech. Opérat. 14, 3–16. English translation: A probabilistic production and inventory problem. Management Sci. 10 (1963), 98108.Google Scholar
[7] Faihe, Y. and Müller, J.-P. (1998). Behaviors coordination using restless bandits allocation indexes. In From Animals to Animats 5 (Proc. 5th Int. Conf. Simulation of Adaptive Behavior), eds Pfeifer, R., Blumberg, B., Meyer, J. A. and Wilson, S. W.. MIT Press, Cambridge, MA.Google Scholar
[8] Gittins, J. C. (1979). Bandit processes and dynamic allocation indices (with discussion). J. R. Statist. Soc. B 41, 148177.Google Scholar
[9] Katehakis, M. N. and Veinott, A. F. Jr (1987). The multi-armed bandit problem: decomposition and computation. Math. Operat. Res. 12, 262268.CrossRefGoogle Scholar
[10] Klimov, G. P. (1974). Time sharing service systems I. Theory Prob. Appl. 19, 532551.CrossRefGoogle Scholar
[11] Manne, A. S. (1960). Linear programming and sequential decisions. Management Sci. 6, 259267.CrossRefGoogle Scholar
[12] Niño-Mora, J. (2000). Admission control to birth–death queueing systems, restless bandit indices, and partial conservation laws. Working paper, Department of Economics and Business, Universitat Pompeu Fabra.Google Scholar
[13] Papadimitriou, C. H. and Tsitsiklis, J. N. (1999). The complexity of optimal queueing network control. Math. Operat. Res. 24, 293305.Google Scholar
[14] Veatch, M. and Wein, L. M. (1996). Scheduling a make-to-stock queue: index policies and hedging points. Operat. Res. 44, 634647.Google Scholar
[15] Weber, R. R. and Weiss, G. (1990). On an index policy for restless bandits. J. Appl. Prob. 27, 637648.CrossRefGoogle Scholar
[16] Weber, R. R. and Weiss, G. (1991). Addendum to ‘On an index policy for restless bandits’. Adv. Appl. Prob. 23, 429430.CrossRefGoogle Scholar
[17] Whittle, P. (1988). Restless bandits: activity allocation in a changing world. In A Celebration of Applied Probability (J. Appl. Prob. 25A), ed. Gani, J.. Applied Probability Trust, Sheffield, pp. 287298.Google Scholar