Hostname: page-component-cd9895bd7-gbm5v Total loading time: 0 Render date: 2024-12-23T14:06:12.660Z Has data issue: false hasContentIssue false

On gradual-impulse control of continuous-time Markov decision processes with exponential utility

Published online by Cambridge University Press:  01 July 2021

Xin Guo*
Affiliation:
Tsinghua University
Aiko Kurushima*
Affiliation:
Sophia University
Alexey Piunovskiy*
Affiliation:
University of Liverpool
Yi Zhang*
Affiliation:
University of Liverpool
*
*Postal address: School of Economics and Management, Tsinghua University, Beijing 100084, China. Email address: [email protected]
**Postal address: Department of Economics, Sophia University, 7-1 Kioi-cho, Chiyoda-ku, Tokyo, 102-8554, Japan. Email address: [email protected]
***Postal address: Department of Mathematical Sciences, University of Liverpool, Liverpool, L69 7ZL, UK.
***Postal address: Department of Mathematical Sciences, University of Liverpool, Liverpool, L69 7ZL, UK.

Abstract

We consider a gradual-impulse control problem of continuous-time Markov decision processes, where the system performance is measured by the expectation of the exponential utility of the total cost. We show, under natural conditions on the system primitives, the existence of a deterministic stationary optimal policy out of a more general class of policies that allow multiple simultaneous impulses, randomized selection of impulses with random effects, and accumulation of jumps. After characterizing the value function using the optimality equation, we reduce the gradual-impulse control problem to an equivalent simple discrete-time Markov decision process, whose action space is the union of the sets of gradual and impulsive actions.

Type
Original Article
Copyright
© The Author(s), 2021. Published by Cambridge University Press on behalf of Applied Probability Trust

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Bäuerle, N. and Popp, A. (2018). Risk-sensitive stopping problems for continuous-time Markov chains. Stochastics 90, 411431.CrossRefGoogle Scholar
Bertsekas, D. and Shreve, S. (1978). Stochastic Optimal Control. Academic Press, New York.Google Scholar
Costa, O. and Davis, M. (1989). Impulsive control of piecewise-deterministic processes. Math. Control Signals Systems 2, 187206.CrossRefGoogle Scholar
Costa, O. and Raymundo, C. (2000). Impulse and continuous control of piecewise deterministic Markov processes. Stochastics 70, 75107.Google Scholar
Costa, O. and Dufour, F. (2013). Continuous Average Control of Piecewise Deterministic Markov Processes. Springer, New York.CrossRefGoogle Scholar
Davis, M. (1993). Markov Models and Optimization. Chapman and Hall, London.CrossRefGoogle Scholar
Dufour, F. and Piunovskiy, A. (2015). Impulsive control for continuous-time Markov decision processes. Adv. Appl. Prob. 47, 106127.CrossRefGoogle Scholar
Feinberg, E. (2005). On essential information in sequential decision processes. Math. Meth. Operat. Res. 62, 399410.CrossRefGoogle Scholar
Feinberg, E., Mandava, M. and Shiryaev, A. (2017). Kolmogorov’s equations for jump Markov processes with unbounded jump rates. To appear in Ann. Operat. Res.CrossRefGoogle Scholar
Forwick, L., Schäl, M. and Schmitz, M. (2004). Piecewise deterministic Markov control processes with feedback controls and unbounded costs. Acta Appl. Math. 82, 239267.CrossRefGoogle Scholar
Ghosh, M. and Saha, S. (2014). Risk-sensitive control of continuous time Markov chains. Stochastics 86, 655675.CrossRefGoogle Scholar
Guo, X. and Zhang, Y. (2020). On risk-sensitive piecewise deterministic Markov decision processes. Appl. Math. Optimization 81, 685710.CrossRefGoogle Scholar
Hernández-Lerma, O. and Lasserre, J. (1996). Discrete-Time Markov Control Processes. Springer, New York.CrossRefGoogle Scholar
Hordijk, A. and van der Duyn Shouten, F. (1984). Discretization and weak convergence in Markov decision drift processes. Math. Operat. Res. 9, 121141.CrossRefGoogle Scholar
Jaśkiewicz, A. (2008). A note on negative dynamic programming for risk-sensitive control. Operat. Res. Lett. 36, 531534.CrossRefGoogle Scholar
Kumar, S. and Pal, C. (2013). Risk-sensitive control of pure jump process on countable space with near monotone cost. Appl. Math. Optimization 68, 311331.Google Scholar
Miller, A., Miller, B. and Stepanyan, K. (2018). Simultaneous impulse and continuous control of a Markov chain in continuous time. Automation Remote Control 81, 469482.CrossRefGoogle Scholar
Palczewski, J. and Stettner, L. (2017). Impulse control maximising average cost per unit time: a non-uniformly ergodic case. SIAM J. Control Optimization 55, 936960.CrossRefGoogle Scholar
Piunovski, A. and Khametov, V. (1985). New effective solutions of optimality equations for the controlled Markov chains with continuous parameter (the unbounded price-function). Problems Control Inf. Theory 14, 303318.Google Scholar
Piunovskiy, A. (1997). Optimal Control of Random Sequences in Problems with Constraints. Kluwer, Dordrecht.CrossRefGoogle Scholar
Van der Duyn Schouten, F. (1983). Markov Decision Processes with Continuous Time Parameter. Mathematisch Centrum, Amsterdam.Google Scholar
Wei, Q. (2016) Continuous-time Markov decision processes with risk-sensitive finite-horizon cost criterion. Math. Meth. Operat. Res. 84, 461487.CrossRefGoogle Scholar
Yushkevich, A. (1980). On reducing a jump controllable Markov model to a model with discrete time. Theory Prob. Appl. 25, 5868.CrossRefGoogle Scholar
Yushkevich, A. (1988). Bellman inequalities in Markov decision deterministic drift processes. Stochastics 23, 2577.CrossRefGoogle Scholar
Zhang, Y. (2017). Continuous-time Markov decision processes with exponential utility. SIAM J. Control Optimization 55, 26362660.CrossRefGoogle Scholar