Hostname: page-component-669899f699-tpknm Total loading time: 0 Render date: 2025-04-24T23:04:28.416Z Has data issue: false hasContentIssue false

Robot imitation from multimodal observation with unsupervised cross-modal representation

Published online by Cambridge University Press:  08 November 2024

Xuanhui Xu
Affiliation:
College of Electronic and Information Engineering, Tongji University, ShangHai, China
Mingyu You*
Affiliation:
College of Electronic and Information Engineering, Tongji University, ShangHai, China National Key Laboratory of Autonomous Intelligent Unmanned Systems, Frontiers Science Center for Intelligent Autonomous Systems, Ministry of Education, Tongji University, ShangHai, China
Hongjun Zhou
Affiliation:
College of Electronic and Information Engineering, Tongji University, ShangHai, China
Bin He
Affiliation:
College of Electronic and Information Engineering, Tongji University, ShangHai, China National Key Laboratory of Autonomous Intelligent Unmanned Systems, Frontiers Science Center for Intelligent Autonomous Systems, Ministry of Education, Tongji University, ShangHai, China
*
Corresponding author: Mingyu You; Email: [email protected]

Abstract

Imitation from Observation (IfO) prompts the robot to imitate tasks from unlabeled videos via reinforcement learning (RL). The performance of the IfO algorithm depends on its ability to extract task-relevant representations since images are informative. Existing IfO algorithms extract image representations by using a simple encoding network or pre-trained network. Due to the lack of action labels, it is challenging to design a supervised task-relevant proxy task to train the simple encoding network. Representations extracted by a pre-trained network such as Resnet are often task-irrelevant. In this article, we propose a new approach for robot IfO via multimodal observations. Different modalities describe the same information from different sides, which can be used to design an unsupervised proxy task. Our approach contains two modules: the unsupervised cross-modal representation (UCMR) module and a self-behavioral cloning (self-BC)-based RL module. The UCMR module learns to extract task-relevant representations via a multimodal unsupervised proxy task. The Self-BC for further offline policy optimization collects successful experiences during the RL training. We evaluate our approach on the real robot pouring water task, quantitative pouring task, and pouring sand task. The robot achieves state-of-the-art performance.

Type
Research Article
Copyright
© The Author(s), 2024. Published by Cambridge University Press

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable

References

Chen, Y., Zeng, C., Wang, Z., Lu, P. and Yang, C., “Zero-shot sim-to-real transfer of reinforcement learning framework for robotics manipulation with demonstration and force feedback,” Robotica 41(3), 10151024 (2023).CrossRefGoogle Scholar
Pan, Y., Cheng, C.-A., Saigol, K., Lee, K., Yan, X., Theodorou, E. A. and Boots, B., “Imitation learning for agile autonomous driving,” Int. J. Robot. Res. 39(2-3), 286302 (2019).CrossRefGoogle Scholar
Hermann, L., Argus, M., Eitel, A., Amiranashvili, A., Burgard, W. and Brox, T.. “Adaptive Curriculum Generation from Demonstrations for Sim-to-Real Visuomotor Control,” IEEE International Conference on Robotics and Automation (ICRA), Paris, France (2020) pp. 64986505.Google Scholar
Torabi, F., Warnell, G. and Stone, P., “Generative adversarial imitation from observation,” arXiv preprint arXiv: 1807.06158 (2018).Google Scholar
Karnan, H., Warnell, G., Xiao, X. S. and Stone, P., “VOILA: Visual-Observation-Only Imitation Learning for Autonomous Navigation,” IEEE International Conference on Robotics and Automation (ICRA), Philadelphia, USA (2022) pp. 24972503.Google Scholar
Shah, R. and Kumar, V., “RRL: Resnet as Representation for Reinforcement Learning,” 2021 In International Conference on Machine Learning (ICML), (2021) pp. 94659476.Google Scholar
Cole, E., Yang, X., Wilber, K., Aodha, O. M. and Belongie, S., “When Does Contrastive Visual Representation Learning Work?,” IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA (2022) pp.1475514764.Google Scholar
Saito, N., Ogata, T., Funabashi, S., Mori, H. and Sugano, S., “How to select and use tools? : Active perception of target objects using multimodal deep learning,” IEEE Robot. Autom. Lett. 6(2), 25172524 (2021).CrossRefGoogle Scholar
Zhang, D., Ju, R. and Cao, Z., “Reinforcement learning-based motion control for snake robots in complex environments,” Robotica 42(4), 947961 (2024).CrossRefGoogle Scholar
Sermanet, P., Lynch, C., Chebotar, Y., Hsu, J., Jang, E., Schaal, S. and Levine, S., “Time-Contrastive Networks: Self-Supervised Learning from Video,” 2018 IEEE International Conference on Robotics and Automation (ICRA), Brisbane, QLD, Australia (2018) pp.11341141.Google Scholar
Torabi, F., Warnell, G. and Stone, P., “Behavioral Cloning from Observation,” 27th International Joint Conference on Artificial Intelligence (IJCAI), Stockholm, Sweden (2018), pp. 49504957.Google Scholar
Cobbe, K., Klimov, O., Hesse, C., Kim, T. and Schulman, J., “Quantifying Generalization in Reinforcement Learning,” International Conference on Machine Learning (ICML), Long Beach, CA, USA (1289) pp. 12821289.Google Scholar
Liu, Y., Gupta, A., Abbeel, P. and Levine, S., “Imitation from Observation: Learning to Imitate Behaviors from Raw Video via Context Translation,” 2018 IEEE International Conference on Robotics and Automation (ICRA), Brisbane, QLD, Australia, (1125) pp. 11181125.Google Scholar
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A. and Bengio, Y., “Generative adversarial networks,” Commun. Acm. 63(11), 139144 (2020).CrossRefGoogle Scholar
Torabi, E., Geiger, S., Warnell, G. and Stone, P., “Sample-efficient adversarial imitation learning from observation,” J. Mach. Learn. Res. 25(31), 132 (2024).Google Scholar
Lee, M., Tan, M., Zhu, Y. and Bohg, J., “Detect, Reject, Correct: Crossmodal Compensation of Corrupted Sensors,” IEEE International Conference on Robotics and Automation (ICRA), Xian, China (2021) pp. 909916.Google Scholar
Tremblay, J. F., Manderson, T., Noca, A., Dudek, G. and Meger, D., “Multimodal dynamics modeling for off-road autonomous vehicles,” 2021 IEEE International Conference on Robotics and Automation (ICRA), Xian, China (1802) pp. 17961802.Google Scholar
Marwan, Q. M., Chua, S. C. and Kwek, L. C., “Comprehensive review on reaching and grasping of objects in robotics,” Robotica 39(10), 18491882 (2021).CrossRefGoogle Scholar
Gangapurwala, S., Geisert, M., Orsolino, R., Fallon, M. and Havoutis, I., “Rloc: Terrain-aware legged locomotion using reinforcement learning and optimal control,” IEEE Trans. Robot. 38(5), 29082927 (2022).CrossRefGoogle Scholar
Brunke, L., Greeff, M., Hall, A. W., Yuan, Z. C., Zhou, S. Q., Panerati, J. and Schoellig, A. P., “Safe learning in robotics: From learning-based control to safe reinforcement learning,” Annu. Rev. Contr. Robot. Auton. Sys. 5(1), 411444 (2022).CrossRefGoogle Scholar
Saha, P., Liu, Y., Gick, B. and Fels, S., “Ultra2Speech – A Deep Learning Framework for Formant Frequency Estimation and Tracking from Ultrasound Tongue Images,” In Medical Image Computing and Computer Assisted Intervention–MICCAI 2020: 23rd International Conference, Lima, Peru, October 4–8, Proceedings, Part III 23, Springer International Publishing (2020) pp. 473482.Google Scholar
Chen, X. and He, K.. “Exploring Simple Siamese Representation Learning,” IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2021) 1575015758.CrossRefGoogle Scholar
Yi, D., Ahn, J. and Ji, S., “An effective optimization method for machine learning based on ADAM,” Appl. Sci. 10(3), 1073 (2020).CrossRefGoogle Scholar
Gu, Y., Cheng, Y. H., Chen, C. L. P. and Wang, X. S., “Proximal policy optimization with policy feedback,” IEEE Trans. Sys. Man. Cyber Syst. 52(7), 46004610 (2021).CrossRefGoogle Scholar
Schulman, J., Levine, S., Moritz, P., Jordan, M. and Abbeel, P., “Trust Region Policy Optimization,” International Conference on Machine Learning (ICML), Lille, France (1897) pp. 18891897.Google Scholar
Li, T., Wu, Y., Cui, X., Dong, H., Fang, F. and Russell, S., “Robust multi-agent reinforcement learning via minimax deep deterministic policy gradient,” Proc. Sym. Edu. Adva. Artifi. Intel. (AAAI) 33(01), 42134220 (2019).Google Scholar
Yuan, R., Zhang, F., Wang, Y., Fu, Y. and Wang, S., “A Q-learning approach based on human reasoning for navigation in a dynamic environment,” Robotica 37(3), 445468 (2019).CrossRefGoogle Scholar
Ruder, S., “An overview of gradient descent optimization algorithms,” arXiv preprint arXiv: 1609.04747, (2016).Google Scholar
Yu, B. and Tao, D., “Heatmap regression via randomized rounding,” IEEE Trans. Pattern Anal. 44(11), 82768289 (2021).CrossRefGoogle Scholar
van der Maaten, L. and Hinton, G., “Visualizing data using t-SNE,” J. Mach. Learn. Res. 9(11), 25792605 (2008).Google Scholar
Wang, C., Duan, H. and Li, L., “Design, simulation, control of a hybrid pouring robot: Enhancing automation level in the foundry industry,” Robotica 42(4), 10181038 (2024).Google Scholar
Supplementary material: File

Xu et al. supplementary material 1

Xu et al. supplementary material
Download Xu et al. supplementary material 1(File)
File 189 Bytes
Supplementary material: File

Xu et al. supplementary material 2

Xu et al. supplementary material
Download Xu et al. supplementary material 2(File)
File 314 Bytes
Supplementary material: File

Xu et al. supplementary material 3

Xu et al. supplementary material
Download Xu et al. supplementary material 3(File)
File 557 Bytes
Supplementary material: File

Xu et al. supplementary material 4

Xu et al. supplementary material
Download Xu et al. supplementary material 4(File)
File 287 Bytes
Supplementary material: File

Xu et al. supplementary material 5

Xu et al. supplementary material
Download Xu et al. supplementary material 5(File)
File 353 Bytes
Supplementary material: File

Xu et al. supplementary material 6

Xu et al. supplementary material
Download Xu et al. supplementary material 6(File)
File 40.2 MB
Supplementary material: File

Xu et al. supplementary material 7

Xu et al. supplementary material
Download Xu et al. supplementary material 7(File)
File 261 Bytes
Supplementary material: File

Xu et al. supplementary material 8

Xu et al. supplementary material
Download Xu et al. supplementary material 8(File)
File 521 Bytes
Supplementary material: File

Xu et al. supplementary material 9

Xu et al. supplementary material
Download Xu et al. supplementary material 9(File)
File 40.4 MB
Supplementary material: File

Xu et al. supplementary material 10

Xu et al. supplementary material
Download Xu et al. supplementary material 10(File)
File 40.4 MB