Hostname: page-component-cd9895bd7-mkpzs Total loading time: 0 Render date: 2024-12-23T16:06:26.406Z Has data issue: false hasContentIssue false

Finite-horizon dynamic optimisation when the terminal reward is a concave functional of the distribution of the final state

Published online by Cambridge University Press:  01 July 2016

E. J. Collins*
Affiliation:
Bristol University
J. M. McNamara*
Affiliation:
Bristol University
*
Postal address: Department of Mathematics, University of Bristol, University Walk, Bristol BS8 1TW, UK.
Postal address: Department of Mathematics, University of Bristol, University Walk, Bristol BS8 1TW, UK.

Abstract

We consider a problem similar in many respects to a finite horizon Markov decision process, except that the reward to the individual is a strictly concave functional of the distribution of the state of the individual at final time T. Reward structures such as these are of interest to biologists studying the fitness of different strategies in a fluctuating environment. The problem fails to satisfy the usual optimality equation and cannot be solved directly by dynamic programming. We establish equations characterising the optimal final distribution and an optimal policy π*. We show that in general π* will be a Markov randomised policy (or equivalently a mixture of Markov deterministic policies) and we develop an iterative, policy improvement based algorithm which converges to π*. We also consider an infinite population version of the problem, and show that the population cannot do better using a coordinated policy than by each individual independently following the individual optimal policy π*.

Type
General Applied Probability
Copyright
Copyright © Applied Probability Trust 1998 

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Bertsekas, D. and Shreve, S. (1978). Stochastic Optimal Control: The Discrete Time Case. Academic Press, New York.Google Scholar
Derman, C. (1970). Finite State Markovian Decision Processes. Academic Press, New York.Google Scholar
Derman, C. and Strauch, R. (1966). A note on memoryless rules for controlling sequential control processes. Ann. Math. Statist. 37, 276278.Google Scholar
Haccou, P. and Iwasa, Y. (1995). Optimal mixed strategies in stochastic environments. Theoret. Popul. Biol. 47, 212243.CrossRefGoogle Scholar
Kallenberg, L. C. M. (1983). Linear Programming and Finite Markov Control Problems. CWI, Amsterdam.Google Scholar
Lewontin, R. C. and Cohen, D. (1969). On population growth in a randomly varying environment. Proc. Nat. Acad. Sci. USA 62, 10561060.CrossRefGoogle Scholar
Luenberger, D. G. (1973). Introduction to Linear and Nonlinear Programming. Addison Wesley, Reading, MA.Google Scholar
McNamara, J. M. (1995). Implicit frequency dependence and kin selection in fluctuating environments. Evol. Ecol. 9, 185203.CrossRefGoogle Scholar
McNamara, J. M., Webb, J. N. and Collins, E. J. (1995). Dynamic optimisation in fluctuating environments. Proc. Roy. Soc. B 261, 279284.Google Scholar
Mangel, M. and Clark, C. W. (1988). Dynamic Modelling in Behavioural Ecology. Princeton University Press, Princeton, NJ.Google Scholar
Miller, B. (1978). On dynamic programming for a stochastic Markovian process with an application to the mean variance models. Management Sci. 24, 199.Google Scholar
Puterman, M. L. (1994). Markov Decision Processes: Discrete Stochastic Dynamic Programming. Wiley, New York.CrossRefGoogle Scholar
Ross, K. W. (1989). Randomized and past-dependent policies for Markov decision processes with multiple constraints. Operat. Research 37, 474477.Google Scholar
Sasaki, A. and Ellner, S. (1995). The evolutionarily stable phenotype distribution in a random environment. Evolution 49, 337350.Google Scholar
Sobel, M. J. (1982). The variance of discounted Markov decision processes. J. Appl. Prob. 19, 774802.Google Scholar
White, D. J. (1988). Mean, variance and probabilistic criteria in finite Markov decision processes: a review. J. Optim. Theory Appl. 56, 129.CrossRefGoogle Scholar
Yoshimura, J. and Clark, C. W. (1991). Individual adaptations in stochastic environments. Evol. Ecol. 5, 173192.CrossRefGoogle Scholar