Hostname: page-component-586b7cd67f-t8hqh Total loading time: 0 Render date: 2024-11-22T14:15:49.806Z Has data issue: false hasContentIssue false

Risk-sensitive semi-Markov decision processes with general utilities and multiple criteria

Published online by Cambridge University Press:  16 November 2018

Yonghui Huang*
Affiliation:
Sun Yat-Sen University
Zhaotong Lian*
Affiliation:
University of Macau
Xianping Guo*
Affiliation:
Sun Yat-Sen University
*
* Postal address: School of Mathematics, Sun Yat-Sen University, Guangzhou, 510275, China.
*** Postal address: Faculty of Business Administration, University of Macau, Macau, China. Email address: [email protected]
* Postal address: School of Mathematics, Sun Yat-Sen University, Guangzhou, 510275, China.

Abstract

In this paper we investigate risk-sensitive semi-Markov decision processes with a Borel state space, unbounded cost rates, and general utility functions. The performance criteria are several expected utilities of the total cost in a finite horizon. Our analysis is based on a type of finite-horizon occupation measure. We express the distribution of the finite-horizon cost in terms of the occupation measure for each policy, wherein the discount is not needed. For unconstrained and constrained problems, we establish the existence and computation of optimal policies. In particular, we develop a linear program and its dual program for the constrained problem and, moreover, establish the strong duality between the two programs. Finally, we provide two special cases of our results, one of which concerns the discrete-time model, and the other the chance-constrained problem.

Type
Original Article
Copyright
Copyright © Applied Probability Trust 2018 

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

[1]Bäuerle, N. and Rieder, U. (2011). Markov Decision Processes with Applications to Finance. Springer, Heidelberg.Google Scholar
[2]Bäuerle, N. and Rieder, U. (2014). More risk-sensitive Markov decision processes. Math. Operat. Res. 39, 105120.Google Scholar
[3]Beutler, F. J. and Ross, K. W. (1986). Time-average optimal constrained semi-Markov decision processes. Adv. Appl. Prob. 18, 341359.Google Scholar
[4]Boyd, S. and Vandenberghe, L. (2004). Convex Optimization. Cambridge University Press.Google Scholar
[5]Cavazos-Cadena, R. and Montes-de-Oca, R. (2005). Nonstationary value iteration in controlled Markov chains with risk-sensitive average criterion. J. Appl. Prob. 42, 905918.Google Scholar
[6]Chávez-Rodríguez, S., Cavazos-Cadena, R. and Cruz-Suárez, H. (2016). Controlled semi-Markov chains with risk-sensitive average cost criterion. J. Optim. Theory Appl. 170, 670686.Google Scholar
[7]Chung, K. J. and Sobel, M. J. (1987). Discounted MDPs: distribution functions and exponential utility maximization. SIAM J. Control Optimization 25, 4962.Google Scholar
[8]Di Masi, G. B. and Stettner, Ł. (2007). Infinite horizon risk sensitive control of discrete time Markov processes under minorization property. SIAM J. Control Optimization 46, 231252.Google Scholar
[9]Feinberg, E. A. and Rothblum, U. G. (2012). Splitting randomized stationary policies in total-reward Markov decision processes. Math. Operat. Res. 37, 129153.Google Scholar
[10]Ghosh, M. and Saha, S. (2014). Risk-sensitive control of continuous time Markov chains. Stochastics 86, 655675.Google Scholar
[11]Guo, X., Vykertas, M. and Zhang, Y. (2013). Absorbing continuous-time Markov decision processes with total cost criteria. Adv. Appl. Prob. 45, 490519.Google Scholar
[12]Haskell, W. B. and Jain, R. (2013). Stochastic dominance-constrained Markov decision processes. SIAM J. Control Optimization 51, 273303.Google Scholar
[13]Haskell, W. B. and Jain, R. (2015). A convex analytic approach to risk-aware Markov decision processes. SIAM J. Control Optimization 53, 15691598.Google Scholar
[14]Hernández-Hernández, D. and Marcus, S. I. (1999). Existence of risk-sensitive optimal stationary policies for controlled Markov processes. Appl. Math. Optimization 40, 273285.Google Scholar
[15]Hernández-Lerma, O. and Lasserre, J. B. (1999). Further Topics on Discrete-Time Markov Control Processes. Springer, New York.Google Scholar
[16]Huang, Y. and Guo, X. (2009). Optimal risk probability for first passage models in semi-Markov decision processes. J. Math. Anal. Appl. 359, 404420.Google Scholar
[17]Mamer, J. W. (1986). Successive approximations for finite horizon semi-Markov decision processes with application to asset liquidation. Operat. Res. 34, 638644.Google Scholar
[18]Piunovskiy, A. and Zhang, Y. (2011). Discounted continuous-time Markov decision processes with unbounded rates: the convex analytic approach. SIAM J. Control Optimization 49, 20322061.Google Scholar
[19]Puterman, M. L. (1994). Markov Decision Processes: Discrete Stochastic Dynamic Programming. John Wiley, New York.Google Scholar
[20]Rockafellar, R. T. (1974). Conjugate Duality and Optimization. SIAM, Philadelphia, PA.Google Scholar
[21]Ross, S. M. (1996). Stochastic Processes, 2nd edn. John Wiley, New York.Google Scholar
[22]Suresh Kumar, K. and Pal, C. (2015). Risk-sensitive ergodic control of continuous time Markov processes with denumerable state space. Stoch. Anal. Appl. 33, 863881.Google Scholar