Given two optional positive bounded processes Y and Y′, defined on a probability space , and a non-negative real a, the problem is to maximize the average reward E(YT) among all the stopping times T verifying the following constraint:
The problem is solved by Lagrangian saddlepoint techniques in the set of randomized stopping times including the set of stopping times.