Hostname: page-component-586b7cd67f-2plfb Total loading time: 0 Render date: 2024-11-29T07:03:11.387Z Has data issue: false hasContentIssue false

AI-Based Learning Approach with Consideration of Safety Criteria on Example of a Depalletization Robot

Published online by Cambridge University Press:  26 July 2019

Mark Jocas*
Affiliation:
Munich University of Applied Sciences;
Philip Kurrek
Affiliation:
Munich University of Applied Sciences;
Firas Zoghlami
Affiliation:
Munich University of Applied Sciences;
Mario Gianni
Affiliation:
University of Plymouth
Vahid Salehi
Affiliation:
Munich University of Applied Sciences;

Abstract

Core share and HTML view are not available for this content. However, as you have access to this content, a full PDF is available via the ‘Save PDF’ action button.

Robotic systems need to achieve a certain level of process safety during the performance of the task and at the same time ensure compliance with safety criteria for the expected behaviour. To achieve this, the system must be aware of the risks related to the performance of the task in order to be able to take these into account accordingly. Once the safety aspects have been learned from the system, the task performance must no longer influence them. To achieve this, we present a concept for the design of a neural network that combines these characteristics. This enables the learning of safe behaviour and the fixation of it. The subsequent training of the task execution no longer influences safety and achieves targeted results in comparison to a conventional neural network.

Type
Article
Creative Commons
Creative Common License - CCCreative Common License - BYCreative Common License - NCCreative Common License - ND
This is an Open Access article, distributed under the terms of the Creative Commons Attribution-NonCommercial-NoDerivatives licence (http://creativecommons.org/licenses/by-nc-nd/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is unaltered and is properly cited. The written permission of Cambridge University Press must be obtained for commercial re-use or in order to create a derivative work.
Copyright
© The Author(s) 2019

References

Abbeel, Pieter, Coates, Adam and Ng, Andrew Y.Autonomous helicopter aerobatics through apprenticeship learning”, The International Journal of Robotics Research, Vol. 29 No. 13, pp. 16081639, 2010. URL https://doi.org/10.1177/0278364910371999.Google Scholar
Achiam, Joshua, Held, David, Tamar, Aviv and Abbeel, Pieter. “Constrained policy optimization”, CoRR, abs/1705.10528, 2017. URL http://arxiv.org/abs/1705.10528.Google Scholar
Alshiekh, Mohammed, Bloem, Roderick, Ehlers, Rüdiger, Könighofer, Bettina, Niekum, Scott and Topcu, Ufuk. “Safe reinforcement learning via shielding”, CoRR, abs/1708.08611, 2017. URL http://arxiv.org/abs/1708.08611.Google Scholar
Arulkumaran, Kai, Deisenroth, Marc Peter, Brundage, Miles and Bharath, Anil Anthony. “Deep reinforcement learning: A brief survey”, IEEE Signal Processing Magazine, Vol. 34 No. 6, pp. 2638, nov 2017. URL https://doi.org/10.1109%2Fmsp.2017.2743240.Google Scholar
Babcock, James, Kramár, János and Yampolskiy, Roman V.Guidelines for artificial intelligence containment”, CoRR, abs/1707.08476, 2017. URL http://arxiv.org/abs/1707.08476.Google Scholar
Gao, Yang, Xu, Huazhe, Lin, Ji, Yu, Fisher, Levine, Sergey and Darrell, Trevor. “Reinforcement learning from imperfect demonstrations”, CoRR, abs/1802.05313, 2018. URL http://arxiv.org/abs/1802.05313.Google Scholar
Garcıa, Javier and Fernández, Fernando. “A comprehensive survey on safe reinforcement learning”, Journal of Machine Learning Research, Vol. 16 No. 1, pp. 14371480, 2015.Google Scholar
Hans, Alexander, Schneegaß, Daniel, Schäfer, Anton Maximilian and Udluft, Steffen. “Safe exploration for reinforcement learning”, In ESANN, pages 143148, 2008.Google Scholar
Koenig, N. and Howard, A. “Design and use paradigms for gazebo, an open-source multi-robot simulator”, In 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE Cat. No.04CH37566). IEEE, 2004. URL https://doi.org/10.1109%2Firos.2004.1389727.Google Scholar
Lipton, Zachary C., Gao, Jianfeng, Li, Lihong, Chen, Jianshu and Deng, Li. “Combating reinforcement learning's sisyphean curse with intrinsic fear”, CoRR, abs/1611.01211, 2016. URL http://arxiv.org/abs/1611.01211.Google Scholar
Majumdar, Anirudha, Singh, Sumeet, Mandlekar, Ajay and Pavone, Marco. “Risk-sensitive inverse reinforcement learning via coherent risk models”, In Robotics: Science and Systems XIII. Robotics: Science and Systems Foundation, jul 2017. URL https://doi.org/10.15607%2Frss.2017.xiii.069.Google Scholar
Menda, Kunal, Driggs-Campbell, Katherine Rose and Kochenderfer, Mykel J.Dropoutdagger: A bayesian approach to safe imitation learning”, CoRR, abs/1709.06166, 2017. URL http://arxiv.org/abs/1709.06166.Google Scholar
Mnih, Volodymyr, Badia, Adria Puigdomenech, Mirza, Mehdi, Graves, Alex, Lillicrap, Timothy, Harley, Tim, Silver, David and Kavukcuoglu, Koray. “Asynchronous methods for deep reinforcement learning”, In International Conference on Machine Learning, pp. 19281937, 2016.Google Scholar
Moldovan, Teodor Mihai and Abbeel, Pieter. “Safe exploration in markov decision processes”, CoRR, abs/1205.4810, 2012. URL http://arxiv.org/abs/1205.4810.Google Scholar
Quigley, Morgan, Conley, Ken, Gerkey, Brian, Faust, Josh, Foote, Tully, Leibs, Jeremy, Wheeler, Rob and Ng, Andrew Y. “Ros: an open-source robot operating system”, In ICRA workshop on open source software, Vol. 3, p. 5. Kobe, Japan, 2009.Google Scholar
Riedl, Mark O. and Harrison, Brent. “Enter the matrix: A virtual world approach to safely interruptable autonomous systems”, CoRR, abs/1703.10284, 2017. URL http://arxiv.org/abs/1703.10284.Google Scholar
Saunders, William, Sastry, Girish, Stuhlmüller, Andreas and Evans, Owain. “Trial without error: Towards safe reinforcement learning via human intervention”, CoRR, abs/1707.05173, 2017. URL http://arxiv.org/abs/1707.05173.Google Scholar
Shrivastava, Ashish, Pfister, Tomas, Tuzel, Oncel, Susskind, Joshua, Wang, Wenda and Webb, Russell. “Learning from simulated and unsupervised images through adversarial training”, In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, jul 2017. URL https://doi.org/10.1109%2Fcvpr.2017.241.Google Scholar
Sutton, Richard S.Dyna, an integrated architecture for learning, planning, and reacting”, ACM SIGART Bulletin, Vol. 2 No. 4, pp. 160163, jul 1991. https://doi.org/10.1145/122344.122377. URL https://doi.org/10.11452F122344.122377.Google Scholar
Thomas, Philip, Theocharous, Georgios and Ghavamzadeh, Mohammad. “High confidence policy improvement”, In Proceedings of the 32nd International Conference on Machine Learning (ICML-15), pp. 23802388, 2015.Google Scholar
Zamora, Iker, Lopez, Nestor Gonzalez, Vilches, Victor Mayoral and Cordero, Alejandro Hernández. “Extending the openai gym for robotics: a toolkit for reinforcement learning using ROS and gazebo”, CoRR, abs/1608.05742, 2016. URL http://arxiv.org/abs/1608.05742.Google Scholar