Hostname: page-component-586b7cd67f-vdxz6 Total loading time: 0 Render date: 2024-11-25T19:38:49.585Z Has data issue: false hasContentIssue false

Learning algorithms for Markov decision processes

Published online by Cambridge University Press:  14 July 2016

Masami Kurano*
Affiliation:
Chiba University
*
Postal address: Department of Mathematics, Faculty of Education, Chiba University, Yayoi-cho, Chiba 260, Japan.

Abstract

This study is concerned with finite Markov decision processes whose dynamics and reward structure are unknown but the state is observable exactly.

We establish a learning algorithm which yields an optimal policy and construct an adaptive policy which is optimal under the average expected reward criterion.

Type
Short Communications
Copyright
Copyright © Applied Probability Trust 1987 

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Billingsley, P. (1961) Statistical Inference for Markov Processes. University of Chicago Press, Chicago.Google Scholar
Federgruen, A. and Schweitzer, P. T. (1981) Non-stationary Markov decision problems with converging parameters. J. Optim. Theory Applic. 34, 207241.10.1007/BF00935474Google Scholar
Hernández-Lerma, O. and Marcus, S. I. (1985) Adaptive control of discounted Markov decision chains. J. Optim. Theory Applic. 46, 227235.10.1007/BF00938426Google Scholar
Kurano, M. (1972) Discrete-time Markovian decision processes with an unknown parameter-average return criterion. J. Operat. Res. Soc. Japan 15, 6776.Google Scholar
Kurano, M. (1983) Adaptive polices in Markov decision processes with uncertain matrices. J. Inf. Optim. Sci. 4, 2140.Google Scholar
Lakshmivarahan, S. (1981) Learning Algorithms, Theory and Applications. Springer-Verlag, New York.10.1007/978-1-4612-5975-6Google Scholar
Loeve, M. (1963) Probability Theory. Van Nostrand, New York.Google Scholar
Mandl, P. (1974) Estimation and control in Markov chains. Adv. Appl. Prob. 6, 4060.Google Scholar
Meybodi, M. R. and Lakshmivarahan, S. (1982) e -optimality of a general class of learning algorithms. Inf. Sci. 20, 120.10.1016/0020-0255(82)90029-9Google Scholar
Ross, S. M. (1970) Applied Probability Models with Optimization Applications. Holden-Day, San Francisco.Google Scholar
Van Hee, K. M. (1978) Bayesian Control of Markov Chains. Mathematical Center Tracts 95, Mathematish Centrum, Amsterdam.Google Scholar