No CrossRef data available.
Published online by Cambridge University Press: 01 July 2016
In the mathematical learning literature, reward–penalty rules have been studied in various decision-theoretic and game-theoretic contexts, the multi-armed bandit problem included. Here we propose an elaboration of Bather's randomised allocation indices which yields rules for the multi-armed bandit which are both reward-penalty and asymptotically optimal.