Hostname: page-component-cd9895bd7-dzt6s Total loading time: 0 Render date: 2024-12-24T01:08:29.015Z Has data issue: false hasContentIssue false

On zero-sum two-person undiscounted semi-Markov games with a multichain structure

Published online by Cambridge University Press:  08 September 2017

Prasenjit Mondal*
Affiliation:
Government General Degree College, Ranibandh
*
* Postal address: Mathematics Department, Government General Degree College, Ranibandh, Bankura, 722135, India. Email address: [email protected]

Abstract

Zero-sum two-person finite undiscounted (limiting ratio average) semi-Markov games (SMGs) are considered with a general multichain structure. We derive the strategy evaluation equations for stationary strategies of the players. A relation between the payoff in the multichain SMG and that in the associated stochastic game (SG) obtained by a data-transformation is established. We prove that the multichain optimality equations (OEs) for an SMG have a solution if and only if the associated SG has optimal stationary strategies. Though the solution of the OEs may not be optimal for an SMG, we establish the significance of studying the OEs for a multichain SMG. We provide a nice example of SMGs in which one player has no optimal strategy in the stationary class but has an optimal semistationary strategy (that depends only on the initial and current state of the game). For an SMG with absorbing states, we prove that solutions in the game where all players are restricted to semistationary strategies are solutions for the unrestricted game. Finally, we prove the existence of stationary optimal strategies for unichain SMGs and conclude that the unichain condition is equivalent to require that the game satisfies some recurrence/ergodicity/weakly communicating conditions.

Type
Research Article
Copyright
Copyright © Applied Probability Trust 2017 

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

[1] Bertsekas, D. P. and Shreve, S. E. (1978). Stochastic Optimal Control: The Discrete Time Case. Academic Press, New York. Google Scholar
[2] Beutler, F. J. and Ross, K. W. (1987). Uniformization for semi-Markov decision processes under stationary policies. J. Appl. Prob. 24, 644656. Google Scholar
[3] Bewley, T. and Kohlberg, E. (1976). The asymptotic theory of stochastic games. Math. Operat. Res. 1, 197208. CrossRefGoogle Scholar
[4] Blackwell, D. (1962). Discrete dynamic programming. Ann. Math. Statist. 33, 719726. Google Scholar
[5] Blackwell, D. and Ferguson, T. S. (1968). The big match. Ann. Math. Statist. 39, 159163. CrossRefGoogle Scholar
[6] Denardo, E. V. and Fox, B. L. (1968). Multichain Markov renewal programs. SIAM J. Appl. Math. 16, 468487. Google Scholar
[7] Federgruen, A. and Tijms, H. C. (1978). The optimality equation in average cost denumerable state semi-Markov decision problems: recurrency conditions and algorithms. J. Appl. Prob. 15, 356373. Google Scholar
[8] Federgruen, A., Hordijk, A. and Tijms, H. C. (1978). A note on simultaneous recurrence conditions on a set of denumerable stochastic matrices. J. Appl. Prob. 15, 842847. Google Scholar
[9] Filar, J. and Vrieze, K. (1997). Competitive Markov Decision Processes. Springer, New York. Google Scholar
[10] Fox, B. (1966). Markov renewal programming by linear fractional programming. SIAM J. Appl. Math. 14, 14181432. Google Scholar
[11] Jaśkiewicz, A. (2002). Zero-sum semi-Markov games. SIAM J. Control Optimization 41, 723739. Google Scholar
[12] Jaśkiewicz, A. (2009). Zero-sum ergodic semi-Markov games with weakly continuous transition probabilities. J. Optimization Theory Appl. 141, 321347. Google Scholar
[13] Jianyong, L. and Xiaobo, Z. (2004). On average reward semi-Markov decision processes with a general multichain structure. Math. Operat. Res. 29, 339352. Google Scholar
[14] Kemeny, J. G. and Snell, J. L. (1976). Finite Markov Chains. Springer, New York. Google Scholar
[15] Kirman, A. P. and Sobel, M. J. (1974). Dynamic oligopoly with inventories. Econometrica 42, 279287. CrossRefGoogle Scholar
[16] Kuhn, H. W. (1953). Extensive games and the problem of information. In Contributions to the Theory of Games, Vol. II. Princeton University Press, pp. 193216. Google Scholar
[17] Lal, A. K. and Sinha, S. (1992). Zero-sum two-person semi-Markov games. J. Appl. Prob. 29, 5672. Google Scholar
[18] Loomis, L. H. (1946). On a theorem of von Neumann. Proc. Nat. Acad. Sci. USA 32, 213215. CrossRefGoogle ScholarPubMed
[19] Luque-Vásquez, F. (2002). Zero-sum semi-Markov games in Borel spaces: discounted and average payoff. Bol. Soc. Mat. Mexicana (3) 8, 227241. Google Scholar
[20] Mertens, J.-F. and Neyman, A. (1981). Stochastic games. Internat. J. Game Theory 10, 5366. Google Scholar
[21] Mondal, P. (2015). Linear programming and zero-sum two-person undiscounted semi-Markov games. Asia-Pacific J. Operat. Res. 32, 1550043. Google Scholar
[22] Mondal, P. (2016). On undiscounted semi-Markov decision processes with absorbing states. Math. Meth. Operat. Res. 83, 161177. Google Scholar
[23] Mondal, P. and Sinha, S. (2013). Ordered field property in a subclass of finite SeR-SIT semi-Markov games. Internat. Game Theory Rev. 15, 1340026. Google Scholar
[24] Mondal, P. and Sinha, S. (2015). Ordered field property for semi-Markov games when one player controls transition probabilities and transition times. Internat. Game Theory Rev. 17, 1540022. Google Scholar
[25] Parthasarathy, T. and Raghavan, T. E. S. (1981). An orderfield property for stochastic games when one player controls transition probabilities. J. Optimization Theory Appl. 33, 375392. Google Scholar
[26] Puterman, M. L. (1994). Markov Decision Processes: Discrete Stochastic Dynamic Programming. John Wiley, New York. Google Scholar
[27] Raut, L. K. (2006). Two-sided altruism, Lindahl equilibrium, and Pareto optimality in overlapping generations models. Econom. Theory 27, 729736. CrossRefGoogle Scholar
[28] Ross, S. M. (1970). Applied Probability Models with Optimization Applications. Holden-Day, San Francisco, CA. Google Scholar
[29] Schäl, M. (1992). On the second optimality equation for semi-Markov decision models. Math. Operat. Res. 17, 470486. CrossRefGoogle Scholar
[30] Schweitzer, P. J. (1971). Iterative solution of the functional equations of undiscounted Markov renewal programming. J. Math. Anal. Appl. 34, 495501. Google Scholar
[31] Shapley, L. S. (1953). Stochastic games. Proc. Nat. Acad. Sci. USA 39, 10951100. CrossRefGoogle ScholarPubMed
[32] Vega-Amaya, O. (2003). Zero-sum average semi-Markov games: fixed-point solutions of the Shapley equation. SIAM J. Control Optimization 42, 18761894. Google Scholar
[33] Vrieze, O. J. (2003). Stochastic games and stationary strategies. In Stochastic Games and Applications (NATO Sci. Ser. C Math. Phys. Sci. 570), Kluwer, Dordrecht, pp. 3750. CrossRefGoogle Scholar