Plan-based reward shaping for multi-agent reinforcement learning

Sam Devlin; Daniel Kudenko

doi:10.1017/S0269888915000181

Plan-based reward shaping for multi-agent reinforcement learning

Published online by Cambridge University Press: 11 February 2016

Sam Devlin and

Daniel Kudenko

Show author details

Sam Devlin: Affiliation:
Department of Computer Science, University of York, York, YO10 5GH, England e-mail: [email protected], [email protected]
Daniel Kudenko: Affiliation:
Department of Computer Science, University of York, York, YO10 5GH, England e-mail: [email protected], [email protected]

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

Recent theoretical results have justified the use of potential-based reward shaping as a way to improve the performance of multi-agent reinforcement learning (MARL). However, the question remains of how to generate a useful potential function.

Previous research demonstrated the use of STRIPS operator knowledge to automatically generate a potential function for single-agent reinforcement learning. Following up on this work, we investigate the use of STRIPS planning knowledge in the context of MARL.

Our results show that a potential function based on joint or individual plan knowledge can significantly improve MARL performance compared with no shaping. In addition, we investigate the limitations of individual plan knowledge as a source of reward shaping in cases where the combination of individual agent plans causes conflict.

Type: Articles
Information: The Knowledge Engineering Review , Volume 31 , Issue 1: Adaptive Learning Agents , January 2016 , pp. 44 - 58

DOI: https://doi.org/10.1017/S0269888915000181 [Opens in a new window]
Copyright: © Cambridge University Press, 2016

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable

References

Asmuth, J., Littman, M. & Zinkov, R. 2008. Potential-based shaping in model-based reinforcement learning. In Proceedings of the Twenty-Third AAAI Conference on Artificial Intelligence, 604–609.Google Scholar

Babes, M., de Cote, E. & Littman, M. 2008. Social reward shaping in the prisoner’s dilemma. In Proceedings of the Seventh Annual International Conference on Autonomous Agents and Multiagent Systems (AAMAS), 3, 1389–1392.Google Scholar

Bertsekas, D. P. 2007. Dynamic Programming and Optimal Control, 3rd edition. Athena Scientific.Google Scholar

Claus, C. & Boutilier, C. 1998. The dynamics of reinforcement learning in cooperative multiagent systems. In Proceedings of the National Conference on Artificial Intelligence, 746–752.Google Scholar

De Hauwere, Y., Vrancx, P. & Nowé, A. 2011. Solving delayed coordination problems in mas (extended abstract). In The 10th International Conference on Autonomous Agents and Multiagent Systems (AAMAS), 1115–1116.Google Scholar

Devlin, S., Grześ, M. & Kudenko, D. 2011. An empirical study of potential-based reward shaping and advice in complex, multi-agent systems. Advances in Complex Systems 14(2), 251–278.CrossRef Google Scholar

Devlin, S. & Kudenko, D. 2011. Theoretical considerations of potential-based reward shaping for multi-agent systems. In Proceedings of the Tenth Annual International Conference on Autonomous Agents and Multiagent Systems (AAMAS).Google Scholar

Devlin, S. & Kudenko, D. 2012. Dynamic potential-based reward shaping. In Proceedings of the Eleventh Annual International Conference on Autonomous Agents and Multiagent Systems (AAMAS).Google Scholar

De Weerdt, M., Ter Mors, A. & Witteveen, C. 2005. Multi-agent planning - an introduction to planning and coordination. Technical report, Delft University of Technology.Google Scholar

Grześ, M. 2010. Improving exploration in reinforcement learning through domain knowledge and parameter analysis. Technical report, University of York.Google Scholar

Grześ, M. & Kudenko, D. 2008a. Multigrid reinforcement learning with reward shaping. In Artificial Neural Networks-ICANN 5163, 357–366. Lecture Notes in Computer Science, Springer.CrossRef Google Scholar

Grześ, M. & Kudenko, D. 2008b. Plan-based reward shaping for reinforcement learning. In Proceedings of the 4th IEEE International Conference on Intelligent Systems (IS'08), 22–29. IEEE.CrossRef Google Scholar

Grześ, M. & Kudenko, D. 2009. Improving optimistic exploration in model-free reinforcement learning. Adaptive and Natural Computing Algorithms 5495, 360–369.CrossRef Google Scholar

Marthi, B. 2007. Automatic shaping and decomposition of reward functions. In Proceedings of the 24th International Conference on Machine learning, 608. ACM.CrossRef Google Scholar

Nash, J. 1951. Non-cooperative games. Annals of Mathematics 54(2), 286–295.CrossRef Google Scholar

Ng, A. Y., Harada, D. & Russell, S. J. 1999. Policy invariance under reward transformations: theory and application to reward shaping. In Proceedings of the 16th International Conference on Machine Learning, 278–287.Google Scholar

Peot, M. & Smith, D. 1992. Conditional nonlinear planning. In Artificial Intelligence Planning Systems: Proceedings of the First International Conference, 189. Morgan Kaufmann Publisher.CrossRef Google Scholar

Puterman, M. L. 1994. Markov Decision Processes: Discrete Stochastic Dynamic Programming. John Wiley and Sons, Inc.CrossRef Google Scholar

Randløv, J. & Alstrom, P. 1998. Learning to drive a bicycle using reinforcement learning and shaping. In Proceedings of the 15th International Conference on Machine Learning, 463–471.Google Scholar

Rosenschein, J. 1982. Synchronization of multi-agent plans. In Proceedings of the National Conference on Artificial Intelligence, 115–119.Google Scholar

Shoham, Y., Powers, R. & Grenager, T. 2007. If multi-agent learning is the answer, what is the question? Artificial Intelligence 171(7), 365–377.CrossRef Google Scholar

Shoham, Y. & Tennenholtz, M. 1995. On social laws for artificial agent societies: off-line design. Artificial Intelligence 73(1–2), 231–252.CrossRef Google Scholar

Sutton, R. S. 1984. Temporal credit assignment in reinforcement learning. PhD thesis, Department of Computer Science, University of Massachusetts.Google Scholar

Sutton, R. S. & Barto, A. G. 1998. Reinforcement Learning: An Introduction. MIT Press.Google Scholar

Ziparo, V. 2005. Multi-agent planning. Technical report, University of Rome.Google Scholar

Article contents

Plan-based reward shaping for multi-agent reinforcement learning

Abstract

Access options

Article purchase

Temporarily unavailable

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests