Article contents
Some memoryless bandit policies
Published online by Cambridge University Press: 14 July 2016
Abstract
We consider a multiarmed bandit problem, where each arm when pulled generates independent and identically distributed nonnegative rewards according to some unknown distribution. The goal is to maximize the long-run average reward per pull with the restriction that any previously learned information is forgotten whenever a switch between arms is made. We present several policies and a peculiarity surrounding them.
MSC classification
- Type
- Short Communications
- Information
- Copyright
- Copyright © Applied Probability Trust 2003
References
- 1
- Cited by