Hostname: page-component-586b7cd67f-rcrh6 Total loading time: 0 Render date: 2024-11-22T00:11:40.838Z Has data issue: false hasContentIssue false

Reinforcement learning-based collision avoidance: impact of reward function and knowledge transfer

Published online by Cambridge University Press:  16 March 2020

Xiongqing Liu
Affiliation:
Department of Aerospace and Mechanical Engineering, University of Southern California, 3650 McClintock Avenue, OHE-430, Los Angeles, CA90089-1453, USA
Yan Jin*
Affiliation:
Department of Aerospace and Mechanical Engineering, University of Southern California, 3650 McClintock Avenue, OHE-430, Los Angeles, CA90089-1453, USA
*
Author for correspondence: Yan Jin, E-mail: [email protected]

Abstract

Collision avoidance for robots and vehicles in unpredictable environments is a challenging task. Various control strategies have been developed for the agent (i.e., robots or vehicles) to sense the environment, assess the situation, and select the optimal actions to avoid collision and accomplish its mission. In our research on autonomous ships, we take a machine learning approach to collision avoidance. The lack of available ship steering data of human ship masters has made it necessary to acquire collision avoidance knowledge through reinforcement learning (RL). Given that the learned neural network tends to be a black box, it is desirable that a method is available which can be used to design an agent's behavior so that the desired knowledge can be captured. Furthermore, RL with complex tasks can be either time consuming or unfeasible. A multi-stage learning method is needed in which agents can learn from simple tasks and then transfer their learned knowledge to closely related but more complex tasks. In this paper, we explore the ways of designing agent behaviors through tuning reward functions and devise a transfer RL method for multi-stage knowledge acquisition. The computer simulation-based agent training results have shown that it is important to understand the roles of each component in a reward function and the various design parameters in transfer RL. The settings of these parameters are all dependent on the complexity of the tasks and the similarities between them.

Type
Research Article
Copyright
Copyright © Cambridge University Press 2020

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Alonso-Mora, J, Breitenmoser, A, Rufli, M, Beardsley, P and Siegwart, R (2013) Optimal reciprocal collision avoidance for multiple non-holonomic robots. In Martinoli, A. et al. (eds) Distributed Autonomous Robotic Systems. Berlin, Heidelberg: Springer, pp. 203216.CrossRefGoogle Scholar
Arnold, A, Nallapati, R and Cohen, W (2007) A comparative study of methods for transductive transfer learning. Seventh IEEE International Conference on Data Mining Workshops (ICDMW 2007), 31 March 2008. Omaha, NE, USA: IEEE Computer Society.10.1109/ICDMW.2007.109CrossRefGoogle Scholar
Bahadori, M, Liu, Y and Zhang, D (2014) A general framework for scalable transductive transfer learning. Knowledge and Information Systems 38, 6183.10.1007/s10115-013-0647-5CrossRefGoogle Scholar
Bertsekas, DP and Tsitsiklis, JN (1996) Neuro-Dynamic Programming. MIT Press.Google Scholar
Bojarski, M, Del Testa, D, Dworakowski, D, Firner, B, Flepp, B, Goyal, P, Jackel, LD, Monfort, M, Muller, U, Zhang, J, Zhang, X, Zhao, J and Zieba, K (2016) End to end learning for self-driving cars. arXiv: 1604.07316 [cs.LG].Google Scholar
Brunn, P (1996) Robot collision avoidance. Industrial Robot: An International Journal 23, 2733.CrossRefGoogle Scholar
Casanova, D, Tardioli, C and Lemaître, A (2014) Space debris collision avoidance using a three-filter sequence. Monthly Notices of the Royal Astronomical Society 442, 32353242.CrossRefGoogle Scholar
Chen, JX (2016) The evolution of computing: AlphaGo. Computing in Science & Engineering 18, 47.10.1109/MCSE.2016.74CrossRefGoogle Scholar
Chen, YF, Liu, M, Everett, M and How, JP (2016) Decentralized non-communicating multiagent collision avoidance with deep reinforcement learning. arXiv: 1609.07845 [cs.MA].CrossRefGoogle Scholar
Churchland, PS and Sejnowski, TJ (2016) The Computational Brain. Chambridge, MA, USA: MIT Press.10.7551/mitpress/11207.001.0001CrossRefGoogle Scholar
Coates, A, Huval, B, Wang, T, Wu, D and Ng, A (2013) Deep learning with COTS HPC systems. Proceedings of the 30th International Conference on Machine Learning. PMLR, Vol. 28. pp. 1337–1345.Google Scholar
Dean, J, Corrado, G, Monga, R, Kai, C, Devin, M, Mao, M, Ranzato, M, Senior, A, Tucker, P, Yang, K, Le, QV and Ng, AY (2012) Large scale distributed deep networks. NIPS'12: Proceedings of the 25th International Conference on Neural Information Processing Systems, Vol. 1. Red Hook, NY, USA: Curran Associates Inc.Google Scholar
Dieleman, S and Schrauwen, B (2014) End-to-end learning for music audio. 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, New York, NY USA.CrossRefGoogle Scholar
Ding, Z, Nasrabadi, N and Fu, Y (2016) Task-driven deep transfer learning for image classification. IEEE International Conference on Acoustics, Speech and Signal Processing, 4–9 May 2014. Florence, Italy: IEEE.10.1109/ICASSP.2016.7472110CrossRefGoogle Scholar
Eleftheria, E, Apostolos, P and Markos, V (2016) Statistical analysis of ship accidents and review of safety level. Safety Science 85, 282292.Google Scholar
Fahimi, F, Nataraj, C and Ashrafiuon, H (2009) Real-time obstacle avoidance for multiple mobile robots. Robotica 27, 189198.CrossRefGoogle Scholar
Fernandez, F and Veloso, M (2006) Probabilistic policy reuse in a reinforcement learning agent. 5th International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS 2006), Vol. 58, 8–12 May 2006. Hakodate, Japan, pp. 720–727.10.1145/1160633.1160762CrossRefGoogle Scholar
Ford, JWW (2009) A Seaman's Guide to the Rule of the Road. Gloucestershire, UK: Morgans Technical Books Limited.Google Scholar
Goerlandt, F and Kujala, P (2014) On the reliability and validity of ship–ship collision risk analysis in light of different perspectives on risk. Safety Science 62, 348365.CrossRefGoogle Scholar
Hameed, S and Hasan, O (2016) Towards autonomous collision avoidance in surgical robots using image segmentation and genetic algorithms. 2016 IEEE Region 10 Symposium (TENSYMP), 9–11 May 2016. Bali, Indonesia: IEEE, pp. 266–270.10.1109/TENCONSpring.2016.7519416CrossRefGoogle Scholar
Hinton, G, Vinyals, O and Dean, J (2015) Distilling the Knowledge in a Neural Network. arXiv. 1503.02531v1 [stat.ML] 9 Mar.Google Scholar
Hourtash, AM, Hingwe, P, Schena, BM and Devengenzo, RL (2016) U.S. Patent No. 9,492,235. Washington, DC: U.S. Patent and Trademark Office.Google Scholar
Jin, Y and Koyama, T (1987) On the design of marine traffic control system (1st report). Journal of the Society of Naval Architects of Japan 162, 183192.Google Scholar
Keller, J, Thakur, D, Gallier, J and Kumar, V (2016) Obstacle avoidance and path intersection validation for UAS: a B-spline approach. 2016 International Conference on Unmanned Aircraft Systems (ICUAS). Arlington, VA, USA: IEEE, pp. 420–429.10.1109/ICUAS.2016.7502631CrossRefGoogle Scholar
Khatib, O (1986) Real-time obstacle avoidance for manipulators and mobile robots. The International Journal of Robotics Research 5(1), 9098.10.1177/027836498600500106CrossRefGoogle Scholar
Kingma, DP and Ba, J (2015) Adam: A method for stochastic optimization, in Proceedings of ICLR, 2015.Google Scholar
Krizhevsky, A, Sutskever, I and Hinton, G (2012) ImageNet classification with deep convolutional neural networks. Communications of the ACM 60(6), 8490.CrossRefGoogle Scholar
Kuderer, M, Gulati, S and Burgard, W (2015) Learning driving styles for autonomous vehicles from demonstration. 2015 IEEE International Conference on Robotics and Automation (ICRA), 26–30 May 2015. Seattle, WA, USA: IEEE, pp. 2641–2646.10.1109/ICRA.2015.7139555CrossRefGoogle Scholar
Le, Q, Ranzato, M, Monga, R, Devin, M, Chen, K, Corrado, G, Dean, J and Ng, A (2012) Building high-level features using large scale unsupervised learning. International Conference on Machine Learning: arXiv: 1112.6209v5 [cs.LG].Google Scholar
LeCun, Y, Bengio, Y and Hinton, G (2015) Deep learning. Nature 521, 436444.CrossRefGoogle ScholarPubMed
Liu, X and Jin, Y (2018) Design of transfer reinforcement learning under low task similarity. ASME 2018 International Design Engineering Technical Conferences & Computers and Information in Engineering Conference, IDETC2018-86013, 26–29 August 2018. Quebec City, Quebec, Canada: American Society of Mechanical Engineers Digital Collection.Google Scholar
Machado, T, Malheiro, T, Monteiro, S, Erlhagen, W and Bicho, E (2016) Multi-constrained joint transportation tasks by teams of autonomous mobile robots using a dynamical systems approach. 2016 IEEE International Conference on Robotics and Automation (ICRA), 16–21 May 2016. Stockholm, Sweden: IEEE, pp. 3111–3117.10.1109/ICRA.2016.7487477CrossRefGoogle Scholar
Mastellone, S, Stipanovic, D, Graunke, C, Intlekofer, K and Spong, M (2008) Formation control and collision avoidance for multi-agent non-holonomic systems: theory and experiments. The International Journal of Robotics Research 27, 107126.10.1177/0278364907084441CrossRefGoogle Scholar
Matarić, MJ (1997) Reinforcement learning in the multi-robot domain. In Arkin, RC and Bekey, GA (eds) Robot Colonies. Boston, MA, USA:Springer, pp. 7383.10.1007/978-1-4757-6451-2_4CrossRefGoogle Scholar
Mericli, C, Mericli, T and Akin, HL (2010) A reward function generation method using genetic algorithms: a robot soccer case study (extended abstract). Proceeding of 9th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2010), Vol. 1–3, 10–14 May 2010, Toronto, Canada.Google Scholar
Mnih, V, Kavukcuoglu, K, Silver, D, Graves, A, Antonoglou, I, Wierstra, D and Riedmiller, M (2013) Playing Atari with deep reinforcement learning. arXiv:1312.5602v1 [cs.LG].Google Scholar
Mukhtar, A, Xia, L and Tang, TB (2015) Vehicle detection techniques for collision avoidance systems: a review. IEEE Transactions on Intelligent Transportation Systems 16, 23182338.CrossRefGoogle Scholar
Ng, AY and Russell, S (2000) Algorithms for inversereinforcement learning, in Proceedings of ICML 2000.Google Scholar
Ohn-Bar, E and Trivedi, MM (2016) Looking at humans in the age of self-driving and highly automated vehicles. IEEE Transactions on Intelligent Vehicles 1, 90104.10.1109/TIV.2016.2571067CrossRefGoogle Scholar
Pan, SJ and Yang, Q (2010) A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering 22, 13451359.Google Scholar
Parisotto, E, Ba, J and Salakhutdinov, R (2016) Actor-mimic: deep multitask and transfer reinforcement learning. arXiv:1511.06342v4 [cs.LG] 22 Feb 2016.Google Scholar
Schaul, T, Quan, J, Antonoglou, I and Silver, D (2016) Prioritized experience replay. arXiv:1511.05952v4 [cs.LG] 25 Feb 2016.Google Scholar
Shiomi, M, Zanlungo, F, Hayashi, K and Kanda, T (2014) Towards a socially acceptable collision avoidance for a mobile robot navigating among pedestrians using a pedestrian model. International Journal of Social Robotics 6, 443455.10.1007/s12369-014-0238-yCrossRefGoogle Scholar
Silver, D, Schrittwieser, J, Simonyan, K, Antonoglou, I, Huang, A, Guez, A, Hubert, T, Baker, L, Lai, M, Bolton, A, Chen, Y, Lillicrap, T, Hui, F, Sifre, L, van den Driessche, G, Graepel, T and Hassabis, D (2017) Mastering the game of Go without human knowledge. Nature 550, 354359.CrossRefGoogle ScholarPubMed
Silver, D, Huang, A, Maddison, CJ, Guez, A, Sifre, L, van den Driessche, G, Schrittwieser, J, Antonoglou, I, Panneershelvam, V, Lanctot, M, Dieleman, S, Grewe, D, Nham, J, Kalchbrenner, N, Sutskever, I, Lillicrap, T, Leach, M, Kavukcuoglu, K, Graepel, T and Hassabis, D (2016) Mastering the game of Go with deep neural networks and tree search. Nature 529, 484CrossRefGoogle ScholarPubMed
Sutton, RS and Barto, AG (2018) Reinforcement Learning: An Introduction. Cambridge MA, USA: MIT Press.Google Scholar
Szepesvari, C (2010) Algorithms for Reinforcement Learning. Morgan & Claypool Publishers.10.2200/S00268ED1V01Y201005AIM009CrossRefGoogle Scholar
Tang, S and Kumar, V (2015) A complete algorithm for generating safe trajectories for multi-robot teams. In: Bicchi A., Burgard W. (eds) Robotics Research. Springer Proceedings in Advanced Robotics, vol 3. New York, NY, USA: Springer, pp 599–616.Google Scholar
Taylor, M and Stone, P (2007) Cross-domain transfer for reinforcement learning. ICML '07: Proceedings of the 24th International Conference on Machine Learning. ACM, pp. 879–886. June 2007, Corvalis, OR, USACrossRefGoogle Scholar
Torrey, L, Shavlik, J, Walker, T and Maclin, R (2006) Skill acquisition via transfer learning and advice taking. European Conference on Machine Learning. Berlin, Heidelberg: Springer, pp. 425–436.Google Scholar
van Hasselt, H, Guez, A and Silver, D (2015) Deep reinforcement learning with double Q-learning. arXiv:1509.06461v3 [cs.LG].Google Scholar
Wang, FY, Zhang, JJ, Zheng, X, Wang, X, Yuan, Y, Dai, X, Zhang, J and Yang, L (2016 a) Where does AlphaGo go: from Church-Turing thesis to AlphaGo thesis and beyond. IEEE/CAA Journal of Automatica Sinica 3, 113120.Google Scholar
Wang, Z, School, T, Hessel, M, van Haselt, H, Lanctot, M and de Freitas, N (2016 b) Dueling network architectures for deep reinforcement learning. arXiv:1511.06581v3 [cs.LG] 5 Apr.Google Scholar
Watkins, CJCH (1989) Learning from delayed rewards (Doctoral dissertation). Cambridge University, Cambridge University Press, Cambrdige, UK.Google Scholar
Yang, IB, Na, SG and Heo, H (2017) Intelligent algorithm based on support vector data description for automotive collision avoidance system. International Journal of Automotive Technology 18, 6977.CrossRefGoogle Scholar
Yosinski, J, Clune, J, Bengio, Y and Lipson, H (2014) How transferrable are features in deep neural networks? NIPS'14: Proceedings of the 27th International Conference on Neural Information Processing Systems, Vol. 2. Cambridge, MA, USA: MIT Press.Google Scholar
Yu, A, Palefsky-Smith, R and Bedi, R (2016) Deep Reinforcement Learning for Simulated Autonomous Vehicle Control. Course Project Reports: Winter 2016 (CS23 1n: Convolutional Neural Networks for Visual Recognition). Stanford University, pp. 17.Google Scholar
Zou, X, Alexander, R and McDermid, J (2016) On the validation of a UAV collision avoidance system developed by model-based optimization: challenges and a tentative partial solution. 2016 46th Annual IEEE/IFIP International Conference on Dependable Systems and Networks Workshop (DSN-W), 28 June–1 July 2016. Toulouse, France: IEEE, pp. 192–199.CrossRefGoogle Scholar