Hostname: page-component-cd9895bd7-gbm5v Total loading time: 0 Render date: 2025-01-03T14:13:20.313Z Has data issue: false hasContentIssue false

Independent reinforcement learners in cooperative Markov games: a survey regarding coordination problems

Published online by Cambridge University Press:  22 February 2012

Laetitia Matignon*
Affiliation:
FEMTO-ST Institute, UMR CNRS 6174, UFC/ENSMM/UTBM, 24 rue Alain Savary, 25000 Besançon, France
Guillaume J. Laurent*
Affiliation:
FEMTO-ST Institute, UMR CNRS 6174, UFC/ENSMM/UTBM, 24 rue Alain Savary, 25000 Besançon, France
Nadine Le Fort-Piat*
Affiliation:
FEMTO-ST Institute, UMR CNRS 6174, UFC/ENSMM/UTBM, 24 rue Alain Savary, 25000 Besançon, France

Abstract

In the framework of fully cooperative multi-agent systems, independent (non-communicative) agents that learn by reinforcement must overcome several difficulties to manage to coordinate. This paper identifies several challenges responsible for the non-coordination of independent agents: Pareto-selection, non-stationarity, stochasticity, alter-exploration and shadowed equilibria. A selection of multi-agent domains is classified according to those challenges: matrix games, Boutilier's coordination game, predators pursuit domains and a special multi-state game. Moreover, the performance of a range of algorithms for independent reinforcement learners is evaluated empirically. Those algorithms are Q-learning variants: decentralized Q-learning, distributed Q-learning, hysteretic Q-learning, recursive frequency maximum Q-value and win-or-learn fast policy hill climbing. An overview of the learning algorithms’ strengths and weaknesses against each challenge concludes the paper and can serve as a basis for choosing the appropriate algorithm for a new domain. Furthermore, the distilled challenges may assist in the design of new learning algorithms that overcome these problems and achieve higher performance in multi-agent applications.

Type
Articles
Copyright
Copyright © Cambridge University Press 2012

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Abdallah, S., Lesser, V. 2008. A multiagent reinforcement learning algorithm with non-linear dynamics. Journal of Artificial Intelligence Research 33, 521549.CrossRefGoogle Scholar
Agogino, A., Turner, K. 2005. Multi-agent reward analysis for learning in noisy domains. In Proceedings of the 4th InternationalJoint Conference on Autonomous Agents and Multiagent Systems, AAMAS'05, 81–88. ACM.CrossRefGoogle Scholar
Bab, A., Brafman, R. I. 2008. Multi-agent reinforcement learning in common interest and fixed sum stochastic games: an experimental study. Journal of Machine Learning Research 9, 26352675.Google Scholar
Balch, T., Arkin, R. C. 1994. Communication in reactive multiagent robotic systems. Autonomous Robots 1(1), 2752.CrossRefGoogle Scholar
Banerjee, B., Peng, J. 2003. Adaptive policy gradient in multiagent learning. In AAMAS '03: Proceedings of the 2nd International Joint Conference on Autonomous Agents and Multiagent Systems, 686–692. ACM.CrossRefGoogle Scholar
Banerjee, B., Sen, S., Peng, J. 2004. On-policy concurrent reinforcement learning. Journal of Experimental & Theoretical Artificial Intelligence 16(4), 245260.CrossRefGoogle Scholar
Benda, M., Jagannathan, V., Dodhiawala, R. 1986. On Optimal Cooperation of Knowledge Sources – an Experimental Investigation. Technical report BCS-G2010-280, Boeing Advanced Technology Center, Boeing Computing Services.Google Scholar
Boutilier, C. 1996. Planning, learning and coordination in multiagent decision processes. In Theoretical Aspects of Rationality and Knowledge, Morgan Kaufmann Publishers Inc., 195201.Google Scholar
Boutilier, C. 1999. Sequential optimality and coordination in multiagent systems. In IJCAI, Morgan Publishers Inc., 478485.Google Scholar
Bowling, M. 2005. Convergence and no-regret in multiagent learning. In Advances in Neural Information Processing Systems, Saul, L. K., Weiss, Y. & Bottou, L. (eds). MIT Press, 209216.Google Scholar
Bowling, M., Veloso, M. 2000. An Analysis of Stochastic Game Theory for Multiagent Reinforcement Learning. Technical report, Computer Science Department, Carnegie Mellon University.Google Scholar
Bowling, M., Veloso, M. 2002. Multiagent learning using a variable learning rate. Artificial Intelligence 136, 215250.CrossRefGoogle Scholar
Brafman, R. I., Tennenholtz, M. 2003. Learning to coordinate efficiently: a model-based approach. Journal of Artificial Intelligence Research 19, 1123.CrossRefGoogle Scholar
Busoniu, L., Babuska, R., De Schutter, B. 2006. Decentralized reinforcement learning control of a robotic manipulator. In Proceedings of the 9th International Conference on Control, Automation, Robotics and Vision (ICARCV 2006), 1347–1352. Singapore.CrossRefGoogle Scholar
Busoniu, L., Babuska, R., De Schutter, B. 2008. A comprehensive survey of multiagent reinforcement learning. IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews 38(2), 156172.CrossRefGoogle Scholar
Carpenter, M., Kudenko, D. 2005. Baselines for joint-action reinforcement learning of coordination in cooperative multi-agent systems. In Adaptive Agents and Multi-Agent Systems II: Adaptation and Multi-Agent Learning, Lecture Notes in Computer Science, 3394, 5572. Springer.CrossRefGoogle Scholar
Claus, C., Boutilier, C. 1998. The dynamics of reinforcement learning in cooperative multiagent systems. In Proceedings of the 15th National Conference on Artificial Intelligence, 746–752, American Association for Artificial Intelligence.Google Scholar
Dowling, J., Cunningham, R., Curran, E., Cahill, V. 2006. Building autonomic systems using collaborative reinforcement learning. Knowledge Engineering Review 21(3), 231238.CrossRefGoogle Scholar
Fulda, N., Ventura, D. 2007. Predicting and preventing coordination problems in cooperative q-learning systems. In Proceedings of the International Joint Conference on Artificial Intelligence. Morgan Kaufmann Publishers Inc.Google Scholar
Gabel, T., Riedmiller, M. 2006. Multi-agent case-based reasoning for cooperative reinforcement learners. In Proceedings of the ECCBR, 3246. Springer.Google Scholar
Gomes, E. R., Kowalczyk, R. 2009. Dynamic analysis of multiagent-learning with ε-greedy exploration. In ICML'09: Proceedings of the 26th International Conference on Machine Learning, 47. ACM.Google Scholar
Hu, J., Wellman, M. P. 2003. Nash q-learning for general-sum stochastic games. Journal of Machine Learning Research 4, 10391069.Google Scholar
Kaelbling, L. P., Littman, M., Moore, A. 1996. Reinforcement learning: a survey. Journal of Artificial Intelligence Research 4, 237285.CrossRefGoogle Scholar
Kapetanakis, S., Kudenko, D. 2002. Reinforcement learning of coordination in cooperative multi-agent systems. In Proceedings of the 9th NCAI, Dechter, R., Kearns, M. & Sutton, R. (eds.). Edmonton, Alberta, Canada.Google Scholar
Kapetanakis, S., Kudenko, D. 2004. Reinforcement learning of coordination in heterogeneous cooperative multi-agent systems In AAMAS ‘04: Proceedings of the 3rd International Joint Conference on Autonomous Agents and Multiagent Systems, 1258–1259. IEEE Computer Society.Google Scholar
Kapetanakis, S., Kudenko, D., Strens, M. J. A. 2005. Learning to coordinate using commitment sequences in cooperative multi-agent systems. In Adaptive Agents and Multi-Agent Systems II: Adaptation and Multi-Agent Learning, Lecture Notes in Computer Science, 106118. Springer.CrossRefGoogle Scholar
Kuyer, L., Whiteson, S., Bakker, B., Vlassis, N. 2008. Multiagent reinforcement learning for urban traffic control using coordination graphs. In ECML PKDD '08: Proceedings of the 2008 European Conference on Machine Learning and Knowledge Discovery in Databases – Part I, Lecture Notes in Computer Science, 5211, 656–671. Springer.CrossRefGoogle Scholar
Lauer, M., Riedmiller, M. 2000. An algorithm for distributed reinforcement learning in cooperative multi-agent systems. In Proceedings of the 17th International Conference on Machine Learning, 535–542. Morgan Kaufmann.Google Scholar
Lauer, M., Riedmiller, M. 2004. Reinforcement learning for stochastic cooperative multi-agent systems. Autonomous Agents and Multi-Agent Systems 03, 15161517.Google Scholar
Laurent, G. J., Matignon, L., Le Fort-Piat, N. 2010. The world of independent learners is not Markovian. Innovation in Knowledge-Based & Intelligent Engineering Systems 15, IOS Press.Google Scholar
Littman, M. 2001. Value-function reinforcement learning in Markov games. Journal of Cognitive Systems Research 2, 5566.CrossRefGoogle Scholar
Luntz, J. E., Messner, W., Choset, H. 2001. Distributed manipulation using discrete actuator arrays. The International Journal of Robotics Research 20(7), 553583.CrossRefGoogle Scholar
Mataric, M. J. 1998. Using communication to reduce locality in distributed multiagent learning. Journal of Experimental & Theoretical Artificial Intelligence 10(3), 357369.CrossRefGoogle Scholar
Matignon, L., Laurent, G. J., Le Fort-Piat, N. 2006. Reward function and initial values : better choices for accelerated goal-directed reinforcement learning. In Proceedings of the 16th International Conference on Artificial Neural Networks (ICANN'06), Lecture Notes in Computer Science, 4131, 840–849. Springer.CrossRefGoogle Scholar
Matignon, L., Laurent, G. J., Le Fort-Piat, N. 2007. Hysteretic q-learning :an algorithm for decentralized reinforcement learning in cooperative multi-agent teams. In Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems IROS 2007, 64–69.Google Scholar
Matignon, L., Laurent, G. J., Le Fort-Piat, N. 2008. A study of FMQ heuristic in cooperative multi-agent games. In Proceedings of the 7th International Conference on Autonomous Agents and Multiagent Systems. Workshop 10 : Multi-Agent Sequential Decision Making in Uncertain Multi-Agent Domains (AAMAS 08), Estoril, Portugal.Google Scholar
Matignon, L., Laurent, G. J., Le Fort-Piat, N., Chapuis, Y. A. 2010. Designing decentralized controllers for distributed-air-jet MEMS-based micromanipulators by reinforcement learning. Journal of Intelligent and Robotic Systems 59(2), 145166.CrossRefGoogle Scholar
McGlohon, M., Sen, S. 2005. Learning to cooperate in multi-agent systems by combining q-learning and evolutionary strategy. International Journal on Lateral Computing 1(2), 5864.Google Scholar
Melo, F. S., Lopes, M. C. 2007. Convergence of independent adaptive learners. In Progress in Artificial Intelligence: 13th Portuguese Conference on Artificial Intelligence, Lecture Notes in Artificial Intelligence, 4874, 555–567. Springer-Verlag.CrossRefGoogle Scholar
Nash, J. F. 1950. Equilibrium points in n-person games. In Proceedings of the National Academy of Sciences of the United States of America 36, 4849.CrossRefGoogle ScholarPubMed
Osborne, M. J., Rubinstein, A. 1994. A Course in Game Theory. MIT Press.Google Scholar
Panait, L., Sullivan, K., Luke, S. 2006. Lenient learners in cooperative multiagent systems. In AAMAS '06: Proceedings of the 5th International Joint Conference on Autonomous Agents and Multiagent Systems, 801–803. ACM Press.CrossRefGoogle Scholar
Panait, L., Tuyls, K., Luke, S. 2008. Theoretical advantages of lenient learners: an evolutionary game theoretic perspective. Journal of Machine Learning Research 9, 423457.Google Scholar
Peshkin, L., Kim, K.-E., Meuleau, N., Kaelbling, L. P. 2000. Learning to cooperate via policy search. In 16th Conference on Uncertainty in Artificial Intelligence, 307–314. Morgan Kaufmann.Google Scholar
Sen, S., Sekaran, M. 1998. Individual learning of coordination knowledge. Journal of Experimental & Theoretical Artificial Intelligence 10(3), 333356.CrossRefGoogle Scholar
Sen, S., Sekaran, M., Hale, J. 1994. Learning to coordinate without sharing information. In Proceedings of the 12th National Conference on Artificial Intelligence, 426–431, Seattle, WA.Google Scholar
Shapley, L. 1953. Stochastic games. Proceedings of the National Academy of Sciences of the United States of America 39, 10951100.CrossRefGoogle ScholarPubMed
Singh, S. P., Jaakkola, T., Littman, M. L., Szepesvari, C. 2000. Convergence results for single-step on-policy reinforcement-learning algorithms. Machine Learning 38(3), 287308.CrossRefGoogle Scholar
Stone, P., Veloso, M. M. 2000. Multiagent systems: a survey from a machine learning perspective. Autonomous Robots 8(3), 345383.CrossRefGoogle Scholar
Sutton, R. S., Barto, A. G. 1998. Reinforcement Learning: An Introduction. The MIT Press.Google Scholar
Tan, M. 1993. Multiagent reinforcement learning: independent vs. cooperative agents. In Proceedings of the 10th International Conference on Machine Learning, 330–337. Morgan Kaufmann.CrossRefGoogle Scholar
Tumer, K., Agogino, A. K. 2010. A multiagent approach to managing air traffic flow. Journal of Autonomous Agents and Multi-Agent Systems 24, 125.Google Scholar
Tumer, K., Agogino, A. 2007. Distributed agent-based air traffic flow management In AAMAS ‘07: Proceedings of the 6th International Joint Conference on Autonomous Agents and Multiagent Systems, 1–8. ACM.CrossRefGoogle Scholar
Tuyls, K., Nowé, A. 2005. Evolutionary game theory and multi-agent reinforcement learning. Knowledge Engineering Review 20(1), 6390.CrossRefGoogle Scholar
Verbeeck, K., Nowé, A., Parent, J., Tuyls, K. 2007. Exploring selfish reinforcement learning in repeated games with stochastic rewards. Autonomous Agents and Multi-Agent Systems 14(3), 239269.CrossRefGoogle Scholar
Wang, Y., de Silva, C. W. 2006. Multi-robot box-pushing: single-agent q-learning vs. team q-learning. In Proceedings opf the IROS, 36943699.Google Scholar
Wang, Y., de Silva, C. W. 2008. A machine-learning approach to multi-robot coordination. Engineering Applications of Artificial Intelligence 21(3), 470484.CrossRefGoogle Scholar
Watkins, C., Dayan, P. 1992. Technical note: Q-learning. Machine Learning 8, 279292.CrossRefGoogle Scholar
Wolpert, D. H., Tumer, K. 1999. An Introduction to Collective Intelligence. Technical Report NASA-ARC-IC-99-63, NASA Ames Research Center.Google Scholar
Wolpert, D. H., Tumer, K. 2001. Optimal payoff functions for members of collectives. Advances in Complex Systems 04(02), 265279.CrossRefGoogle Scholar
Wunder, M., Littman, M. L., Babes, M. 2010. Classes of multiagent q-learning dynamics with epsilon-greedy exploration. In ICML'10: Proceedings of the 27th international Conference on Machine Learning, 1167–1174. Omni Press.Google Scholar
Yang, E., Gu, D. 2004. Multiagent Reinforcement Learning for Multi-Robot Systems: A Survey. Department of Computer Science, University of Essex.Google Scholar