Hostname: page-component-745bb68f8f-mzp66 Total loading time: 0 Render date: 2025-01-11T05:29:36.923Z Has data issue: false hasContentIssue false

Zero-shot sim-to-real transfer of reinforcement learning framework for robotics manipulation with demonstration and force feedback

Published online by Cambridge University Press:  07 September 2022

Yuanpei Chen
Affiliation:
College of Automation Science and Engineering, South China University of Technology, Guangzhou, China
Chao Zeng
Affiliation:
School of Automation, Guangdong University of Technology, Guangzhou, China
Zhiping Wang
Affiliation:
School of Electronic Engineering, Dongguan University of Technology, Dongguan, China
Peng Lu
Affiliation:
Department of Mechanical Engineering, The University of Hong Kong, Hong Kong, China
Chenguang Yang*
Affiliation:
College of Automation Science and Engineering, South China University of Technology, Guangzhou, China
*
*Corresponding author. E-mail: [email protected]

Abstract

In the field of robot reinforcement learning (RL), the reality gap has always been a problem that restricts the robustness and generalization of algorithms. We propose Simulation Twin (SimTwin) : a deep RL framework that can help directly transfer the model from simulation to reality without any real-world training. SimTwin consists of a RL module and an adaptive correct module. We train the policy using the soft actor-critic algorithm only in a simulator with demonstration and domain randomization. In the adaptive correct module, we design and train a neural network to simulate the human error correction process using force feedback. Subsequently, we combine the above two modules through digital twin to control real-world robots, correct simulator parameters by comparing the difference between simulator and reality automatically, and then generalize the correct action through the trained policy network without additional training. We demonstrate the proposed method in an open cabinet task; the experiments show that our framework can reduce the reality gap without any real-world training.

Type
Research Article
Copyright
© The Author(s), 2022. Published by Cambridge University Press

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Zhao, W., Queralta, J. P. and Westerlund, T., “Sim-to-real Transfer in Deep Reinforcement Learning for Robotics: A Survey,” In: 2020 IEEE Symposium Series on Computational Intelligence (SSCI) (IEEE, 2020) pp. 737744.CrossRefGoogle Scholar
Gupta, A., Devin, C., Liu, Y., Abbeel, P. and Levine, S., “Learning invariant feature spaces to transfer skills with reinforcement learning,” arXiv preprint arXiv:1703.02949 (2017).Google Scholar
Wang, W., Zheng, V. W., Yu, H. and Miao, C., “A survey of zero-shot learning: Settings, methods, and applications,” ACM Trans. Intell. Syst. Technol. (TIST) 10(2), 137 (2019).Google Scholar
Tobin, J., Fong, R., Ray, A., Schneider, J., Zaremba, W. and Abbeel, P., “Domain Randomization for Transferring Deep Neural Networks From Simulation to the Real World,” In: 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE, 2017) pp. 2330.CrossRefGoogle Scholar
Peng, X. B., Andrychowicz, M., Zaremba, W. and Abbeel, P., “Sim-to-real Transfer of Robotic Control with Dynamics Randomization,” In: 2018 IEEE International Conference on Robotics and Automation (ICRA) (IEEE, 2018) pp. 38033810.CrossRefGoogle Scholar
Nguyen, H. and La, H., “Review of Deep Reinforcement Learning for Robot Manipulation,” In: 2019 Third IEEE International Conference on Robotic Computing (IRC) (IEEE, 2019) pp. 590595.CrossRefGoogle Scholar
Chebotar, Y., Handa, A., Makoviychuk, V., Macklin, M., Issac, J., Ratliff, N. and Fox, D., “Closing the Sim-to-real Loop: Adapting Simulation Randomization with Real World Experience,” In: 2019 International Conference on Robotics and Automation (ICRA) (IEEE, 2019) pp. 89738979.CrossRefGoogle Scholar
Lillicrap, T. P., Hunt, J. J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D. and Wierstra, D., “Continuous control with deep reinforcement learning,” arXiv preprint arXiv: 1509.02971 (2015).Google Scholar
Haarnoja, T., Zhou, A., Hartikainen, K., Tucker, G., Ha, S., Tan, J., Kumar, V., Zhu, H., Gupta, A., P. Abbeel and S. Levine, “Soft actor-critic algorithms and applications,” arXiv preprint arXiv: 1812.05905 (2018).Google Scholar
Hester, T., Vecerik, M., Pietquin, O., Lanctot, M., Schaul, T., Piot, B., Horgan, D., Quan, J., Sendonaris, A., Osband, I., Dulac-Arnold, G., Agapiou, J., Leibo, J. and A. Gruslys, “Deep Q-Learning From Demonstrations,” In: Thirty-Second AAAI Conference on Artificial Intelligence (2018).CrossRefGoogle Scholar
Christiano, P., Shah, Z., Mordatch, I., Schneider, J., Blackwell, T., Tobin, J., Abbeel, P. and Zaremba, W., “Transfer from simulation to real world through learning deep inverse dynamics model,” arXiv preprint arXiv: 1610.03518 (2016).Google Scholar
Bi, T., Sferrazza, C. and D’Andrea, R., “Zero-shot sim-to-real transfer of tactile control policies for aggressive swing-up manipulation,” IEEE Robot. Automat. Lett. 6(3), 57615768 (2021).CrossRefGoogle Scholar
Guha, A. and Annaswamy, A., “Mrac-rl: a framework for on-line policy adaptation under parametric model uncertainty,” arXiv preprint arXiv:2011.10562 (2020).Google Scholar
Zeng, C., Su, H., Li, Y., Guo, J. and Yang, C., “An approach for robotic leaning inspired by biomimetic adaptive control,” IEEE Trans. Ind. Inform. 18(3), 14791488 (2022).CrossRefGoogle Scholar
Zeng, C., Li, Y., Guo, J., Huang, Z., Wang, N. and Yang, C., “A unified parametric representation for robotic compliant skills with adaptation of impedance and force,” IEEE/ASME Trans. Mechatron. 27(2), 623–633 (2021).Google Scholar
Martín-Martín, R., Lee, M. A., Gardner, R., Savarese, S., Bohg, J. and Garg, A., “Variable Impedance Control in End-Effector Space: An Action Space for Reinforcement Learning in Contact-Rich Tasks,” In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE, 2019) pp. 10101017.CrossRefGoogle Scholar
Beltran-Hernandez, C. C., Petit, D., Ramirez-Alpizar, I. G., Nishi, T., Kikuchi, S., Matsubara, T. and Harada, K., “Learning force control for contact-rich manipulation tasks with rigid position-controlled robots,” IEEE Robot. Automat. Lett. 5(4), 57095716 (2020).CrossRefGoogle Scholar
Wu, J., Yang, Y., Cheng, X., Zuo, H. and Cheng, Z., “The Development of Digital Twin Technology Review,” In: 2020 Chinese Automation Congress (CAC) (IEEE, 2020) pp. 49014906.CrossRefGoogle Scholar
Xia, K., Sacco, C., Kirkpatrick, M., Saidy, C., Nguyen, L., Kircaliali, A. and Harik, R., “A digital twin to train deep reinforcement learning agent for smart manufacturing plants: environment, interfaces and intelligence,” J. Manuf. Syst. 58(3), 210230 (2021).CrossRefGoogle Scholar
Mayr, M., Chatzilygeroudis, K., Ahmad, F., Nardi, L. and Krueger, V., “Learning of Parameters in Behavior Trees for Movement Skills,” In: 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE, 2021) pp. 75727579.CrossRefGoogle Scholar
Andrychowicz, O. M., Baker, B., Chociej, M., Jozefowicz, R., McGrew, B., Pachocki, J., Petron, A., Plappert, M., Powell, G., A. Ray, J. Schneider, S. Sidor, J. Tobin, P. Welinder, L. Weng and W. Zaremba, “Learning dexterous in-hand manipulation,” Int. J. Robot. Res. 39(1), 320 (2020).CrossRefGoogle Scholar
Makoviychuk, V., Wawrzyniak, L., Guo, Y., Lu, M., Storey, K., Macklin, M., Hoeller, D., Rudin, N., Allshire, A., A. Handa and G. State, “Isaac gym: High performance GPU-based physics simulation for robot learning,” arXiv preprint arXiv: 2108.10470 (2021).Google Scholar