OPTIMAL MIXING OF MARKOV DECISION RULES FOR MDP CONTROL

Dinard van der Laan

doi:10.1017/S0269964811000039

OPTIMAL MIXING OF MARKOV DECISION RULES FOR MDP CONTROL

Published online by Cambridge University Press: 17 May 2011

Dinard van der Laan

Show author details

Dinard van der Laan: Affiliation:
Tinbergen Institute and Department of Econometrics and Operations Research, VU University, De Boelelaan 1105, 1081 HV Amsterdam, The Netherlands E-mail: [email protected]

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

In this article we study Markov decision process (MDP) problems with the restriction that at decision epochs, only a finite number of given Markov decision rules are admissible. For example, the set of admissible Markov decision rules could consist of some easy-implementable decision rules. Additionally, many open-loop control problems can be modeled as an MDP with such a restriction on the admissible decision rules. Within the class of available policies, optimal policies are generally nonstationary and it is difficult to prove that some policy is optimal. We give an example with two admissible decision rules—={d1, d2} —for which we conjecture that the nonstationary periodic Markov policy determined by its period cycle (d1, d1, d2, d1, d2, d1, d2, d1, d2) is optimal. This conjecture is supported by results that we obtain on the structure of optimal Markov policies in general. We also present some numerical results that give additional confirmation for the conjecture for the particular example we consider.

Type: Research Article
Information: Probability in the Engineering and Informational Sciences , Volume 25 , Issue 3 , July 2011 , pp. 307 - 342

DOI: https://doi.org/10.1017/S0269964811000039 [Opens in a new window]
Copyright: Copyright © Cambridge University Press 2011

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

REFERENCES

1.Altman, B., Gaujal, E. & Hordijk, A. (2003). Discrete-event control of stochastic networks: Multimodularity and regularity. Lecture Notes in Mathematics. New York: Springer Verlag.CrossRef Google Scholar

2.Altman, E., Gaujal, B. & Hordijk, A. (2000). Balanced sequences and optimal routing. Journal of the ACM 47: 752–775.CrossRef Google Scholar

3.Altman, E., Gaujal, B. & Hordijk, A. (2000). Multimodularity, convexity and optimization properties. Mathematics of Operations Research 25: 324–347.CrossRef Google Scholar

4.Altman, E., Gaujal, B., Hordijk, A. & Koole, G. (1998) Optimal admission, routing and service assignment control: the case of single buffer queues. In the 37th IEEE Conference on Decision and Control, Tampa, FL, Vol. 2, pp. 2119–2124.CrossRef Google Scholar

5.Altman, E. & Shwartz, A. (1991). Markov decision problems and state-action frequencies. SIAM Journal on Control and Optimization 29: 786–809.CrossRef Google Scholar

6.Bhulai, S., Farenhorst-Yuan, T., Heidergott, B. & van der Laan, D.A. (2010). Optimal balanced control for call centers. Technical report, Tinbergen Institute.CrossRef Google Scholar

7.Cao, X.R. (1998). The MacLaurin series for performance functions of Markov chains. Advances in Applied Probability 30: 676–692.CrossRef Google Scholar

8.Fernández-Gaucherand, E., Araposthathis, A. & Marcus, S.I. (1991). On the average cost optimality equation and the structure of optimal policies for partially observable Markov decision processes. Annals of Operations Research 29: 439–470.CrossRef Google Scholar

9.Fernández-Gaucherand, E., Araposthathis, A. & Marcus, S.I. (1991). Remarks on the existence of solutions to the average cost optimality equation in Markov decision processes. Systems and Control Letters 15: 425–432.CrossRef Google Scholar

10.Gaujal, B., Hordijk, A. & van der Laan, D.A. (2007). On the optimal policy for deterministic and exponential polling systems. Probability in the Engineering and Informational Sciences 21: 157–187.CrossRef Google Scholar

11.Hajek, B. (1985). Extremal splittings of point processes. Mathematics of Operations Research 10(4): 543–556.CrossRef Google Scholar

12.Heidergott, B. & Hordijk, A. (2003). Taylor series expansions for stationary Markov chains. Advances in Applied Probability 35: 1046–1070.CrossRef Google Scholar

13.Heidergott, B. & Vázquez-Abad, F. (2008). Measure valued differentiation for Markov chains. Journal of Optimization and Applications 136: 187–209.CrossRef Google Scholar

14.Heidergott, B., Vázquez-Abad, F.J., Pflug, G. & Farenhorst-Yuan, T. (2010). Gradient estimation for discrete-event systems by measure-valued differentiation. ACM Transactions on Modeling and Computer Simulation (TOMACS) 20: 1–28.CrossRef Google Scholar

15.Hernández-Lerma, O. & Lasserre, J.B. (1996). Discrete-time Markov control processes: Basic optimality criteria. New York: Springer.CrossRef Google Scholar

16.Hordijk, A. & van der Laan, D.A. (2005). On the average waiting time for regular routing to deterministic queues. Mathematics of Operations Research 30: 521–544.CrossRef Google Scholar

17.Koole, G. (1999). On the static assignment to parallel servers. IEEE Transactions on Automatic Control 44: 1588–1592.CrossRef Google Scholar

18.Lothaire, M. (2002). Algebraic combinatorics on words. Cambridge: Cambridge University Press.CrossRef Google Scholar

19.MacPhee, I.M. & Jordan, B.P. (1995). Optimal search for a moving target. Probability in the Engineering and Informational Sciences 9: 159–182.CrossRef Google Scholar

20.Morse, M. & Hedlund, G.A. (1940). Symbolic dynamics II — sturmian trajectories. American Journal of Mathematics 62: 1–42.CrossRef Google Scholar

21.Pflug, G.C. (1996). Optimization of stochastic models. Amsterdam: Kluwer Academic.CrossRef Google Scholar

22.Puterman, M. (1994). Markov decision processes: Discrete stochastic dynamic programming. New York: John Wiley and Sons.CrossRef Google Scholar

23.Ross, K.W. (1989). Randomized and past-dependent policies for Markov decision processes with multiple constraints. Operations Research 37: 474–477.CrossRef Google Scholar

24.Ross, S.M. (1983). Introduction to stochastic dynamic programming. New York: Academic Press.Google Scholar

25.Tijdeman, R. (2000). Fraenkel's conjecture for six sequences. Discrete Mathematics 222: 223–234.CrossRef Google Scholar

Article contents

OPTIMAL MIXING OF MARKOV DECISION RULES FOR MDP CONTROL

Abstract

Access options

References

REFERENCES

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests