Hostname: page-component-586b7cd67f-gb8f7 Total loading time: 0 Render date: 2024-11-28T20:07:39.929Z Has data issue: false hasContentIssue false

Markov Branching Decision Chains with Interest-Rate-Dependent Rewards*

Published online by Cambridge University Press:  27 July 2009

Ying Huang
Affiliation:
Philips Laboratories, 345 Scarborough Road, Briarcliff Manor, New York 10510
Arthur F. Veinott Jr
Affiliation:
Department of Operations Research, Terman Engineering Center, Stanford University, Stanford, California 94305

Abstract

Finite-state-and-action Markov branching decision chains are studied with bounded endogenous expected population sizes and interest-rate-dependent one-period rewards that are analytic in the interest rate at zero. The existence of a stationary strong-maximum-present-value policy is established. Miller and Veinott's [1969] strong policy-improvement method is generalized to find in finite time a stationary n-present-value optimal policy and, when the one-period rewards are rational in the interest rate, a stationary strong-maximum-present-value policy. This extends previous studies of Blackwell [1962], Miller and Veinott [1969], Veinott [1974], and Rothblum [1974, 1975], in which the one-period rewards are independent of the interest rate, and Denardo [1971] in which semi-Markov decision chains with small interest rates are studied. The problem of finding a stationary n-present-value optimal policy is also formulated as a staircase linear program in which the objective function and right-hand sides, but not the constraint matrix, depend on the interest rate, and solutions for all small enough positive interest rates are sought. The optimal solutions of the primal and dual are polynomials in the reciprocal of the interest rate. A constructive rule is given for finding a stationary n-present-value optimal policy from an optimal solution of the asymptotic linear program. This generalizes the linear programming approaches for finding maximum-reward-rate and maximum-present-value policies for Markov decision chains studied by Manne [1960], d'Epenoux [1960, 1963], Balinski [1961], Derman [1962], Denardo and Fox [1968], Denardo [1970], Derman and Veinott [1972], Veinott [1973], and Hordijk and Kallenberg [1979, 1984].

Type
Research Article
Copyright
Copyright © Cambridge University Press 1995

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

1.Balinski, M.L. (1961). On solving discrete stochastic decision problems. Navy Supply System Research Study 2, Mathematica, Princeton.Google Scholar
2.Blackwell, D. (1962). Dynamic programming. Annals of Mathematical Statistics 33: 719726.CrossRefGoogle Scholar
3.Cantaluppi, L.J. (1981). Semi-Markov decision chains with holding-time-dependent policies. Ph.D. dissertation, Department of Operations Research, Stanford University, Stanford, CA.Google Scholar
4.Cantaluppi, L.J. (1984). Computation of optimal policies in discounted semi-Markov decision chains. OR Spektrum 6(3): 147160.CrossRefGoogle Scholar
5.Cantaluppi, L.J. (1984). Optimality of piecewise-constant policies in semi-Markov decision chains. SIAM Journal on Control and Optimization 22(5): 723739.CrossRefGoogle Scholar
6.Dantzig, G.B. (1955). Optimal solution of dynamic Leontief model with substitution. Econometrica 23(3): 295302.CrossRefGoogle Scholar
7.DeCani, J.S. (1963). “On the Transient and Asymptotic Behavior of a Markov Chains ‘Embedded’ in Continuous Time,” and “A Dynamic Programming Algorithm for Embedded Markov Chains,” mimeographed. Aeronautical Computer Laboratory, U.S. Naval Air Development Center, Johnsville, PA.Google Scholar
8.Denardo, E.V. (1970). Computing a bias-optimal policy in a discrete-time Markov decision problem. Operations Research 18(2): 279288.CrossRefGoogle Scholar
9.Denardo, E.V. (1971). Markov renewal programs with small interest rates. Annals of Mathematical Statistics 42: 477496.CrossRefGoogle Scholar
10.Denardo, E.V. & Fox, B.L. (1968). Multichain Markov renewal programs. SIAM Journal on Applied Mathematics 16(3): 468487.CrossRefGoogle Scholar
11.d'Epenoux, F. (1960). Sur un problème de production et de stockage dans l'Alèatoire. Revue Francaise de Recherche Operationelle 14: 316.Google Scholar
12.d'Epenoux, F. (1963). A probabilistic production and inventory problem. (Partly redrafted from a translation of D'Epenoux [12].). Management Science 10: 98108.CrossRefGoogle Scholar
13.Derman, C. (1962). On sequential decisions and Markov chains. Management Science 9: 1624.CrossRefGoogle Scholar
14.Derman, C. & Veinott, A.F. Jr., (1972). Constrained Markov decision chains. Management Science 19(4): 389390.CrossRefGoogle Scholar
15.Fox, B. (1966). Markov renewal programming by linear fractional programming. SIAM Journal on Applied Mathematics 14(6): 14181432.CrossRefGoogle Scholar
16.Hordijk, A. & Kallenberg, L.C.M. (1979). Linear programming and Markov decision chains. Management Science 25(4): 352362.CrossRefGoogle Scholar
17.Hordijk, A. & Kallenberg, L.C.M. (1984). Constrained undiscounted stochastic dynamic programming. Mathematics of Operations Research 9(2): 276289.CrossRefGoogle Scholar
18.Howard, R.A. (1960). Dynamic programming and Markov processes. Cambridge, MA: MIT Press.Google Scholar
19.Howard, R.A. (1963). Semi-Markov decision processes. Bulletin of the International Statistical Institute 40:625652.Google Scholar
20.Jeroslow, R.G. (1973). Asymptotic linear programming. Operations Research 9(2): 276289.Google Scholar
21.Jewell, W.S. (1963). Markov-renewal programming. I: Formulation, finite return models. Operations Research 11: 938971.CrossRefGoogle Scholar
22.Lamond, B.F. (1989). A generalized inverse method for asymptotic linear programming. Mathematical Programming 23: 7186.CrossRefGoogle Scholar
23.Manne, A.S. (1960). Linear programming and sequential decisions. Management Science 21: 11281141.Google Scholar
24.Miller, B.L. & Veinott, A.F. Jr., (1969). Discrete dynamic programming with a small interest rate. Annals of Mathematical Statistics 40: 366370.CrossRefGoogle Scholar
25.Rothblum, U.G. (1974). Multiplicative Markov decision chains. Ph.D. dissertation, Department of Operations Research, Stanford University, Stanford, CA.Google Scholar
26.Rothblum, U.G. (1975). Normalized Markov decision chains I; sensitive discount optimality. Operations Research 23 (4): 785795.CrossRefGoogle Scholar
27.Rothblum, U.G. & Veinott, A.F. Jr., (1975). Cumulative average optimality for normalized Markov decision chains. Unpublished manuscript.Google Scholar
28.Rothblum, U.G. & Veinott, A.F. Jr., (1994). Markov population decision chains. Unpublished manuscript.Google Scholar
29.Shapley, L.S. (1953). Stochastic games. Proceedings of the National Academy of Sciences U.S.A. 39: 10951100.CrossRefGoogle ScholarPubMed
30.Veinott, A.F. Jr., (1968). Extreme points of Leontief substitution systems. Linear Algebra Applications 1: 181194.CrossRefGoogle Scholar
31.Veinott, A.F. Jr., (1969). Discrete dynamic programming with sensitive discount optimality criteria. Annals of Mathematical Statistics 40: 16351660.CrossRefGoogle Scholar
32.Veinott, A.F. Jr., (1973). Linear programming formulation for average reward Markov decision chains and state-action frequencies. Unpublished manuscript.Google Scholar
33.Veinott, A.F. Jr., (1974). Markov decision chains. In G.B. Dantzig & B.C. Eaves (eds.), Studies in optimization. 10: 124159. The Mathematical Association of America.Google Scholar