Some results on two-armed bandits when both projects vary

Brendan O'Flaherty

doi:10.2307/3214424

Some results on two-armed bandits when both projects vary

Published online by Cambridge University Press: 14 July 2016

Brendan O'Flaherty

Show author details

Brendan O'Flaherty*: Affiliation:
Columbia University
*: ∗Postal address: Department of Economics, Columbia University, New York, NY 10027, USA.

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

In the multi-armed bandit problem, the decision-maker must choose each period a single project to work on. From the chosen project she receives an immediate reward that depends on the current state of the project. Next period the chosen project makes a stochastic transition to a new state, but projects that are not chosen remain in the same state. What happens in a two-armed bandit context if projects not chosen do not remain in the same state? We derive two sufficient conditions for the optimal policy to be myopic: either the transition function for chosen projects has in a certain sense uniformly stronger stochastic dominance than the transition function for unchosen projects, or both transition processes are normal martingales, the variance of which is independent of the history of process choices.

Keywords

FROZEN PROJECTS STOCHASTIC MONOTONICITY SURPASSING MYOPIC POLICIES NORMAL MARTINGALE FUNCTIONS

Type: Short Communications
Information: Journal of Applied Probability , Volume 26 , Issue 3 , September 1989 , pp. 655 - 658

DOI: https://doi.org/10.2307/3214424 [Opens in a new window]

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Blackwell, D. (1965) Discounted dynamic programming. Ann. Math. Statist. 36, 226–235.Google Scholar

Daley, D. J. (1968) Stochastically monotone Markov chains. Z. Wahrscheinlichkeitsth. 10, 305–307.Google Scholar

Gittins, J. C. and Jones, D. M. (1974) A dynamic allocation index for the design of experiments. In Progress in Statistics, ed. Gani, J., North-Holland, Amsterdam, 241–266.Google Scholar

Article contents

Some results on two-armed bandits when both projects vary

Abstract

Keywords

Access options

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests