The Expected Total Cost Criterion for Markov Decision Processes under Constraints: A Convex Analytic Approach

François Dufour; M. Horiguchi; A. B. Piunovskiy

doi:10.1239/aap/1346955264

The Expected Total Cost Criterion for Markov Decision Processes under Constraints: A Convex Analytic Approach

Part of: Markov processes

Published online by Cambridge University Press: 04 January 2016

François Dufour ,

M. Horiguchi and

A. B. Piunovskiy

Show author details

François Dufour*: Affiliation:
Université Bordeaux, IMB and INRIA Bordeaux Sud-ouest
M. Horiguchi*: Affiliation:
Kanagawa University
A. B. Piunovskiy*: Affiliation:
University of Liverpool
*: ∗ Postal address: INRIA Bordeaux Sud-ouest, CQFD Team, 351 cours de la Libération, F-33400 Talence, France. Email address: [email protected]
∗∗ Postal address: Department of Mathematics, Faculty of Engineering, Kanagawa University, 3-27-1 Rokkakubashi, Kanagawa-ku, Yokohama 221-8686, Japan. Email address: [email protected]
∗∗∗ Postal address: Department of Mathematical Sciences, University of Liverpool, Liverpool L69 7ZL, UK. Email address: [email protected]

Article contents

Abstract
References

Rights & Permissions

Abstract

Core share and HTML view are not available for this content. However, as you have access to this content, a full PDF is available via the ‘Save PDF’ action button.

This paper deals with discrete-time Markov decision processes (MDPs) under constraints where all the objectives have the same form of expected total cost over the infinite time horizon. The existence of an optimal control policy is discussed by using the convex analytic approach. We work under the assumptions that the state and action spaces are general Borel spaces, and that the model is nonnegative, semicontinuous, and there exists an admissible solution with finite cost for the associated linear program. It is worth noting that, in contrast to the classical results in the literature, our hypotheses do not require the MDP to be transient or absorbing. Our first result ensures the existence of an optimal solution to the linear program given by an occupation measure of the process generated by a randomized stationary policy. Moreover, it is shown that this randomized stationary policy provides an optimal solution to this Markov control problem. As a consequence, these results imply that the set of randomized stationary policies is a sufficient set for this optimal control problem. Finally, our last main result states that all optimal solutions of the linear program coincide on a special set with an optimal occupation measure generated by a randomized stationary policy. Several examples are presented to illustrate some theoretical issues and the possible applications of the results developed in the paper.

Keywords

Markov decision process expected total cost criterion constraint linear programming occupation measure

MSC classification

Primary: 90C40: Markov and semi-Markov decision processes

Secondary: 60J10: Markov chains (discrete-time Markov processes on discrete state spaces) 90C90: Applications of mathematical programming

Type: General Applied Probability
Information: Advances in Applied Probability , Volume 44 , Issue 3 , September 2012 , pp. 774 - 793

DOI: https://doi.org/10.1239/aap/1346955264 [Opens in a new window]
Copyright: © Applied Probability Trust

References

Altman, E. (1999). Constrained Markov Decision Processes. Chapman & Hall/CRC, Boca Raton, FL.Google Scholar

Bäuerle, N. and Rieder, U. (2011). Markov Decision Processes with Applications to Finance. Springer, Heidelberg.Google Scholar

Bertsekas, D. P. (1987). Dynamic Programming. Prentice Hall, Englewood Cliffs, NJ.Google Scholar

Bertsekas, D. P. and Shreve, S. E. (1978). Stochastic Optimal Control (Math. Sci. Eng. 139). Academic Press, New York.Google Scholar

Borkar, V. S. (2002). Convex analytic methods in Markov decision processes. In Handbook of Markov Decision Processes (Internat. Ser. Operat. Res. Manag. 40), Kluwer, Boston, MA, pp. 347–375.Google Scholar

Dufour, F. and Piunovskiy, A. B. (2010). Multiobjective stopping problem for discrete-time Markov processes: convex analytic approach. J. Appl. Prob. 47, 947–966.Google Scholar

Feinberg, E. A. (2002). Total reward criteria. In Handbook of Markov Decision Processes (Internat. Ser. Operat. Res. Manag. 40), Kluwer, Boston, MA, pp. 173–207.Google Scholar

Hernández-Lerma, O. and Lasserre, J. B. (1996). Discrete-Time Markov Control Processes (Appl. Math. 30). Springer, New York.Google Scholar

Hernández-Lerma, O. and Lasserre, J. B. (1999). Further Topics on Discrete-Time Markov Control Processes (Appl. Math. 42). Springer, New York.CrossRef Google Scholar

Horiguchi, M. (2001). Markov decision processes with a stopping time constraint. Math. Meth. Operat. Res. 53, 279–295.Google Scholar

Horiguchi, M. (2001). Stopped Markov decision processes with multiple constraints. Math. Meth. Operat. Res. 54, 455–469.Google Scholar

Luenberger, D. G. and Ye, Y. (2010). Linear and Nonlinear Programming (Internat. Ser. Operat. Res. Manag. Sci. 116), 3rd edn. Springer, New York.Google Scholar

Piunovskiy, A. B. (1997). Optimal Control of Random Sequences in Problems with Constraints (Math. Appl. 410). Kluwer, Dordrecht.Google Scholar

Puterman, M. L. (1994). Markov Decision Processes: Discrete Stochastic Dynamic Programming. John Wiley, New York.Google Scholar

Rockafellar, R. T. (1970). Convex Analysis (Princeton Math. Ser. 28). Princeton University Press.Google Scholar

Schäl, M. (1975). On dynamic programming: compactness of the space of policies. Stoch. Process. Appl. 3, 345–364.Google Scholar

Article contents

The Expected Total Cost Criterion for Markov Decision Processes under Constraints: A Convex Analytic Approach

Abstract

Keywords

MSC classification

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests