Uniformization for semi-Markov decision processes under stationary policies

Frederick J. Beutler; Keith W. Ross

doi:10.2307/3214096

Uniformization for semi-Markov decision processes under stationary policies

Published online by Cambridge University Press: 14 July 2016

Frederick J. Beutler and

Keith W. Ross

Show author details

Frederick J. Beutler*: Affiliation:
University of Michigan, Ann Arbor
Keith W. Ross*: Affiliation:
University of Pennsylvania
*: ∗Postal address: Department of Electrical Engineering and Computer Science, EECS, University of Michigan, Ann Arbor, MI 48109, USA.
∗∗Postal address: Department of Systems Engineering, University of Pennsylvania, Philadelphia, PA 19104, USA.

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

Uniformization permits the replacement of a semi-Markov decision process (SMDP) by a Markov chain exhibiting the same average rewards for simple (non-randomized) policies. It is shown that various anomalies may occur, especially for stationary (randomized) policies; uniformization introduces virtual jumps with concomitant action changes not present in the original process. Since these lead to discrepancies in the average rewards for stationary processes, uniformization can be accepted as valid only for simple policies.

We generalize uniformization to yield consistent results for stationary policies also. These results are applied to constrained optimization of SMDP, in which stationary (randomized) policies appear naturally. The structure of optimal constrained SMDP policies can then be elucidated by studying the corresponding controlled Markov chains. Moreover, constrained SMDP optimal policy computations can be more easily implemented in discrete time, the generalized uniformization being employed to relate discrete- and continuous-time optimal constrained policies.

Keywords

SEMI-MARKOV PROCESS MARKOV DECISION PROCESS DYNAMIC PROGRAMMING CONSTRAINED OPTIMIZATION OPTIMAL POLICY LAGRANGE MULTIPLIER

Type: Research Papers
Information: Journal of Applied Probability , Volume 24 , Issue 3 , September 1987 , pp. 644 - 656

DOI: https://doi.org/10.2307/3214096 [Opens in a new window]
Copyright: Copyright © Applied Probability Trust 1987

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable

References

[1] Beutler, F. J. and Ross, K. W. (1985) Optimal policies for controlled Markov chains with a constraint. J. Math. Anal. Appl. 112, 236–252.CrossRef Google Scholar

[2] Beutler, F. J. and Ross, K. W. (1986) Time-average optimal constrained semi-Markov decision processes. Adv. Appl. Prob. 18, 341–359.Google Scholar

[3] Borkar, V. (1983) Controlled Markov chains and stochastic networks. SIAM J. Control Optim. 21, 652–666.Google Scholar

[4] Çinlar, E. (1975) Introduction to Stochastic Processes. Prentice-Hall, Englewood Cliffs, NJ.Google Scholar

[5] Hajek, B. (1984) Optimal control of two interacting service stations. IEEE Trans. Autom. Control 29, 491–499.Google Scholar

[6] Kleinrock, L. (1975) Queueing Systems Volume I: Theory. Wiley, New York.Google Scholar

[7] Lippman, S. A. (1975) Applying a new device in the optimization of exponential queueing systems. Operat. Res. 23, 687–710.Google Scholar

[8] Rosberg, Z., Varaiya, P. and Walrand, J. (1982) Optimal control of service in tandem queues. IEEE Trans. Autom. Control 27, 600–610.Google Scholar

[9] Ross, K. Constrained Markov Decision Processes with Queueing Applications. Dissertation, Computer, Information and Control Engineering Program, University of Michigan.Google Scholar

[10] Ross, S. (1971) Applied Probability Models with Optimization Applications. Holden-Day, San Francisco.Google Scholar

[11] Serfozo, R. F. (1979) An equivalence between continuous and discrete time Markov decision processes. Operat. Res. 27, 616–620.CrossRef Google Scholar

[12] Stidham, S. Jr. (1982) Optimal control of arrivals to queues and networks of queues. Proc. 21st IEEE Conf. Decision and Control, Orlando, Florida.Google Scholar

Article contents

Uniformization for semi-Markov decision processes under stationary policies

Abstract

Keywords

Access options

Article purchase

Temporarily unavailable

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests