This paper establishes the existence of an optimal stationary strategy in a leavable Markov decision process with countable state space and undiscounted total reward criterion.
Besides assumptions of boundedness and continuity, an assumption is imposed on the model which demands the continuity of the mean recurrence times on a subset of the stationary strategies, the so-called ‘good strategies'. For practical applications it is important that this assumption is implied by an assumption about the cost structure and the transition probabilities. In the last part we point out that our results in general cannot be deduced from related works on bias-optimality by Dekker and Hordijk, Wijngaard or Mann.