Finite-horizon dynamic optimisation when the terminal reward is a concave functional of the distribution of the final state

E. J. Collins; J. M. McNamara

doi:10.1239/aap/1035227995

Finite-horizon dynamic optimisation when the terminal reward is a concave functional of the distribution of the final state

Part of: Hamilton-Jacobi theories, including dynamic programming

Published online by Cambridge University Press: 01 July 2016

E. J. Collins and

J. M. McNamara

Show author details

E. J. Collins*: Affiliation:
Bristol University
J. M. McNamara*: Affiliation:
Bristol University
*: ∗ Postal address: Department of Mathematics, University of Bristol, University Walk, Bristol BS8 1TW, UK.
∗ Postal address: Department of Mathematics, University of Bristol, University Walk, Bristol BS8 1TW, UK.

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

We consider a problem similar in many respects to a finite horizon Markov decision process, except that the reward to the individual is a strictly concave functional of the distribution of the state of the individual at final time T. Reward structures such as these are of interest to biologists studying the fitness of different strategies in a fluctuating environment. The problem fails to satisfy the usual optimality equation and cannot be solved directly by dynamic programming. We establish equations characterising the optimal final distribution and an optimal policy π*. We show that in general π* will be a Markov randomised policy (or equivalently a mixture of Markov deterministic policies) and we develop an iterative, policy improvement based algorithm which converges to π*. We also consider an infinite population version of the problem, and show that the population cannot do better using a coordinated policy than by each individual independently following the individual optimal policy π*.

Keywords

Fluctuating environment Markov decision processes dynamic programming

MSC classification

Primary: 90C40: Markov and semi-Markov decision processes

Secondary: 92D40: Ecology 49L20: Dynamic programming method

Type: General Applied Probability
Information: Advances in Applied Probability , Volume 30 , Issue 1 , March 1998 , pp. 122 - 136

DOI: https://doi.org/10.1239/aap/1035227995 [Opens in a new window]
Copyright: Copyright © Applied Probability Trust 1998

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable

References

Bertsekas, D. and Shreve, S. (1978). Stochastic Optimal Control: The Discrete Time Case. Academic Press, New York.Google Scholar

Derman, C. (1970). Finite State Markovian Decision Processes. Academic Press, New York.Google Scholar

Derman, C. and Strauch, R. (1966). A note on memoryless rules for controlling sequential control processes. Ann. Math. Statist. 37, 276–278.Google Scholar

Haccou, P. and Iwasa, Y. (1995). Optimal mixed strategies in stochastic environments. Theoret. Popul. Biol. 47, 212–243.CrossRef Google Scholar

Kallenberg, L. C. M. (1983). Linear Programming and Finite Markov Control Problems. CWI, Amsterdam.Google Scholar

Lewontin, R. C. and Cohen, D. (1969). On population growth in a randomly varying environment. Proc. Nat. Acad. Sci. USA 62, 1056–1060.CrossRef Google Scholar

Luenberger, D. G. (1973). Introduction to Linear and Nonlinear Programming. Addison Wesley, Reading, MA.Google Scholar

McNamara, J. M. (1995). Implicit frequency dependence and kin selection in fluctuating environments. Evol. Ecol. 9, 185–203.CrossRef Google Scholar

McNamara, J. M., Webb, J. N. and Collins, E. J. (1995). Dynamic optimisation in fluctuating environments. Proc. Roy. Soc. B 261, 279–284.Google Scholar

Mangel, M. and Clark, C. W. (1988). Dynamic Modelling in Behavioural Ecology. Princeton University Press, Princeton, NJ.Google Scholar

Miller, B. (1978). On dynamic programming for a stochastic Markovian process with an application to the mean variance models. Management Sci. 24, 199.Google Scholar

Puterman, M. L. (1994). Markov Decision Processes: Discrete Stochastic Dynamic Programming. Wiley, New York.CrossRef Google Scholar

Ross, K. W. (1989). Randomized and past-dependent policies for Markov decision processes with multiple constraints. Operat. Research 37, 474–477.Google Scholar

Sasaki, A. and Ellner, S. (1995). The evolutionarily stable phenotype distribution in a random environment. Evolution 49, 337–350.Google Scholar

Sobel, M. J. (1982). The variance of discounted Markov decision processes. J. Appl. Prob. 19, 774–802.Google Scholar

White, D. J. (1988). Mean, variance and probabilistic criteria in finite Markov decision processes: a review. J. Optim. Theory Appl. 56, 1–29.CrossRef Google Scholar

Yoshimura, J. and Clark, C. W. (1991). Individual adaptations in stochastic environments. Evol. Ecol. 5, 173–192.CrossRef Google Scholar

Article contents

Finite-horizon dynamic optimisation when the terminal reward is a concave functional of the distribution of the final state

Abstract

Keywords

MSC classification

Access options

Article purchase

Temporarily unavailable

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests