Team learning from human demonstration with coordination confidence

Bikramjit Banerjee; Syamala Vittanala; Matthew Edmund Taylor

doi:10.1017/S0269888919000043

Team learning from human demonstration with coordination confidence

Part of: Adaptive Learning Agents 2018

Published online by Cambridge University Press: 05 November 2019

Bikramjit Banerjee ,

Syamala Vittanala and

Matthew Edmund Taylor

Show author details

Bikramjit Banerjee: Affiliation:
School of Computing Sciences and Computer Engineering, University of Southern Mississippi, Hattiesburg, MS 39406, USA; e-mail: [email protected]
Syamala Vittanala: Affiliation:
School of Computing Sciences and Computer Engineering, University of Southern Mississippi, Hattiesburg, MS 39406, USA; e-mail: [email protected]
Matthew Edmund Taylor: Affiliation:
School of Electrical Engineering & Computer Science, Washington State University, Pullman, WA 99164, USA e-mail: [email protected]

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

Among an array of techniques proposed to speed-up reinforcement learning (RL), learning from human demonstration has a proven record of success. A related technique, called Human-Agent Transfer, and its confidence-based derivatives have been successfully applied to single-agent RL. This article investigates their application to collaborative multi-agent RL problems. We show that a first-cut extension may leave room for improvement in some domains, and propose a new algorithm called coordination confidence (CC). CC analyzes the difference in perspectives between a human demonstrator (global view) and the learning agents (local view) and informs the agents’ action choices when the difference is critical and simply following the human demonstration can lead to miscoordination. We conduct experiments in three domains to investigate the performance of CC in comparison with relevant baselines.

Type: Research Article
Information: The Knowledge Engineering Review , Volume 34 , 2019 , e12

DOI: https://doi.org/10.1017/S0269888919000043 [Opens in a new window]
Copyright: © Cambridge University Press, 2019

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable

References

Argall, B. D., Chernova, S., Veloso, M. & Browning, B. 2009. A survey of robot learning from demonstration. Robotics and Autonomous Systems 57(5), 469–483. http://dx.doi.org/10.1016/j.robot.2008.10.024 CrossRef Google Scholar

Chernova, S. & Veloso, M. 2007. Multiagent collaborative task learning through imitation. In Proceedings of the 4th International Symposium on Imitation in Animals and Artifacts (AIBS-07), Artificial and Ambient Intelligence.Google Scholar

da Silva, F. L., Glatt, R. & Costa, A. H. R. 2017. Simultaneously learning and advising in multiagent reinforcement learning. In Proceedings of the 16th Conference on Autonomous Agents and MultiAgent Systems (AAMAS-17).Google Scholar

Fernandez, F., Garcia, J. & Veloso, M. 2010. Probabilistic policy reuse for inter-task transfer learning. Robotics and Autonomous Systems 58(7), 866–871.CrossRef Google Scholar

Fudenberg, D. & Levine, K. 1998. The Theory of Learning in Games. MIT Press.Google Scholar

Kraemer, L. & Banerjee, B. 2016. Multi-agent reinforcement learning as a rehearsal for decentralized planning. Neurocomputing 190, 82–94.CrossRef Google Scholar

Le, H. M., Yue, Y., Carr, P. & Lucey, P. 2017. Coordinated multi-agent imitation learning. In Proceedings of the 34th International Conference on Machine Learning (ICML-17).Google Scholar

MacGlashan, J. 2014. The Brown-UMBC reinforcement learning and planning (BURLAP) library, http://burlap.cs.brown.edu/ Google Scholar

Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Bellemare, M. G., Graves, A., Riedmiller, M., Fidjeland, A. K., Ostrovski, G., Petersen, S., Beattie, C., Sadik, A., Antonoglou, I., King, H., Kumaran, D., Wierstra, D., Legg, S. & Hassabis, D. 2015. Human-level control through deep reinforcement learning. Nature 518, 529–533.CrossRef Google Scholar PubMed

Silver, D., Huang, A., Maddison, C. J., Guez, A., Sifre, L., van den Driessche, G., Schrittwieser, J., Antonoglou, I., Panneershelvam, V., Lanctot, M., Dieleman, S., Grewe, D., Nham, J., Kalchbrenner, N., Sutskever, I., Lillicrap, T., Leach, M., Kavukcuoglu, K., Graepel, T. & Hassabis, D. 2016. Mastering the game of Go with deep neural networks and tree search. Nature 529, 484–489.CrossRef Google Scholar PubMed

Song, J., Ren, H., Sadigh, D. & Ermon, S. 2018. Multi-Agent Generative Adversarial Imitation Learning. In Proceedings of the 32nd Conference on Neural Information Processing Systems (NeurIPS 2018).Google Scholar

Sutton, R. & Barto, A. G. 1998. Reinforcement Learning: An Introduction, MIT Press.Google Scholar

Taylor, M. E. & Stone, P. 2009. Transfer learning for reinforcement learning domains: A survey. Journal of Machine Learning Research 10(1), 1633–1685.Google Scholar

Taylor, M. E., Suay, H. B. & Chernova, S. 2011. Integrating reinforcement learning with human demonstrations of varying ability. In Proceedings of the International Conference on Autonomous Agents and Multiagent Systems (AAMAS).Google Scholar

Wang, Z. & Taylor, M. E. 2017. Improving reinforcement learning with confidence-based demonstrations. In Proceedings of the 26th International Conference on Artificial Intelligence (IJCAI).CrossRef Google Scholar

Wang, Z. & Taylor, M. E. 2019, Interactive reinforcement learning with dynamic reuse of prior knowledge from human/agent’s demonstration. In Proceedings of the 28th International Joint Conference on Artificial Intelligence (IJCAI).CrossRef Google Scholar

Article contents

Team learning from human demonstration with coordination confidence

Abstract

Access options

Article purchase

Temporarily unavailable

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests