Improve generalization for neural visual-SLAM with Bayes online learning

Jun Liu; Haihang Deng

doi:10.1017/S0263574725000360

Improve generalization for neural visual-SLAM with Bayes online learning

Published online by Cambridge University Press: 31 March 2025

Jun Liu and

Haihang Deng

Show author details

Jun Liu: Affiliation:
School of Physics and Electronic Engineering, Northeast Petroleum University, Daqing, Peoples R China
Haihang Deng*: Affiliation:
School of Physics and Electronic Engineering, Northeast Petroleum University, Daqing, Peoples R China
*: Corresponding author: Haihang Deng; Email: [email protected]

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

Among various deep learning-based SLAM systems, many exhibit low accuracy and inadequate generalization on non-training datasets. The deficiency in generalization ability can result in significant errors within SLAM systems during real-world applications, particularly in environments that diverge markedly from those represented in the training set. This paper presents a methodology to enhance the generalization capabilities of deep learning SLAM systems. It leverages their superior performance in feature extraction and introduces Exponential Moving Average (EMA) and Bayes online learning to improve generalization and localization accuracy in varied scenarios. Experimental validation, utilizing Absolute Trajectory Error (ATE) metrics on the dataset, has been conducted. The results demonstrate that this method effectively reduces errors by $20\%$ on the EuRoC dataset and by $35\%$ on the TUM dataset, respectively.

Keywords

deep learning SLAM EMA Bayes online learning

Type: Research Article
Information: Robotica , First View , pp. 1 - 15

DOI: https://doi.org/10.1017/S0263574725000360 [Opens in a new window]
Copyright: © The Author(s), 2025. Published by Cambridge University Press

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable

References

Ada, S. E., Ugur, E. and Akin, H. L., “Generalization in transfer learning: Robust control of robot locomotion,” Robotica 40(11), 3811–3836 (2022).CrossRef Google Scholar

Ahmad, T., Ma, Y., Yahya, M., Ahmad, B., Nazir, S. and u. Haq, A., “Object detection through modified YOLO neural network,” Sci. Programming-NETH 2020(1), 8403262 (2020).Google Scholar

Bai, Y., Yang, E., Han, B., Yang, Y., Li, J., Mao, Y., Niu, G. and Liu, T., “Understanding and improving early stopping for learning with noisy labels,” Adv. Neural Inf. Process. Syst. 34, 24392–24403 (2021).Google Scholar

Bjorck, N., Gomes, C. P., Selman, B. and Weinberger, K. Q., “Understanding batch normalization,” Adv. Neural Inf. Process. Syst. 31, 7694–7705 (2018).Google Scholar

Burri, M., Nikolic, J., Gohl, P., Schneider, T., Rehder, J., Omari, S., Achtelik, M. W. and Siegwart, R., “The EuRoC micro aerial vehicle datasets,” Int. J. Rob. Res. 35(10), 1157–1163 (2016).CrossRef Google Scholar

Cai, Z., Ravichandran, A., Maji, S., Fowlkes, C., Tu, Z. and Soatto, S., “Exponential moving average normalization for self-supervised and semi-supervised learning,” In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Piscataway, NJ: IEEE (2021) pp. 194–203.Google Scholar

Campos, C., Elvira, R., Rodríguez, J. J. G., Montiel, J. M. and Tardós, J. D., “ORB-SLAM3: An accurate open-source library for visual, visual–inertial, and multimap SLAM,” IEEE Trans. Robotics 37(6), 1874–1890 (2021).Google Scholar

Cheein, F. A. A., “SLAM-based maneuverability strategy for unmanned car-like vehicles”, Robotica 31(6), 905–921 (2013).Google Scholar

Czarnowski, J., Laidlow, T., Clark, R. and Davison, A. J., “DeepFactors: Real time probabilistic dense monocular SLAM,” IEEE Robot. Autom. Lett. 5(2), 721–728 (2020).Google Scholar

Dai, A., Chang, A. X., Savva, M., Halber, M., Funkhouser, T. and Nießner, M., “Scannet: Richly-annotated 3D reconstructions of indoor scenes,” In: Proceedings of the IEEE conference on computer vision and pattern recognition, Piscataway, NJ: IEEE (2017) pp. 5828–5839.Google Scholar

Davison, A. J., Reid, I. D., Molton, N. D. and Stasse, O., “MonoSLAM: Real-time single camera SLAM,” IEEE Trans. Pattern Anal. Mach. Intell. 29(6), 1052–1067 (2007).Google Scholar PubMed

Deraz, A. A., Badawy, O., Elhosseini, M. A., Mostafa, M., Ali, H. A. and El-Desouky, A. I., “Deep learning based on LSTM model for enhanced visual odometry navigation system,” Ain Shams Eng. J. 14(8), 102050 (2023).CrossRef Google Scholar

Di Giammarino, L., Brizi, L., Guadagnino, T., Stachniss, C. and Grisetti, G., “MD-SLAM: Multi-cue direct SLAM,” In: 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Piscataway, NJ: IEEE (2022) pp. 11047–11054.Google Scholar

Dusmanu, M., Schönberger, J. L. and Pollefeys, M., “Multi-view optimization of local feature geometry,” In: Computer Vision–ECCV 2020: 16th European Conference Part, Springer, Glasgow, UK (2020) pp. 670–686. August 23--28, 2020, Proceedings, Part I 16 2020.Google Scholar

Engel, J., Koltun, V. and Cremers, D., “Direct sparse odometry,” IEEE Trans. Pattern Anal. Mach. Intell. 40(3), 611–625 (2017).Google Scholar PubMed

Forster, C., Pizzoli, M. and Scaramuzza, D., “SVO: Fast semi-direct monocular visual odometry,” In: 2014 IEEE International Conference on Robotics and Automation (ICRA), Piscataway, NJ: IEEE (2014) pp. 15–22.CrossRef Google Scholar

Ganaie, M. A., Hu, M., Malik, A. K., Tanveer, M. and Suganthan, P. N., “Ensemble deep learning: A review,” Eng. Appl. Artif. Intel. 115, 105151 (2022).Google Scholar

Gruevski, I. and Gaber, S., “Basic time series models in financial forecasting,” J. Econ. 6(1), 76–89 (2021).Google Scholar

Jaouedi, N., Boujnah, N. and Bouhlel, M. S., “A new hybrid deep learning model for human action recognition,” J. King Saud Univ. Comput. Inf. Sci. 32(4), 447–453 (2020).CrossRef Google Scholar

Johari, M. M., Carta, C. and Fleuret, F., “ESLAM: Efficient dense slam system based on hybrid representation of signed distance fields,” In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Piscataway, NJ: IEEE (2023) pp. 17408–17419.Google Scholar

Kerl, C., Sturm, J. and Cremers, D., “Dense visual slam for RGB-D cameras,” In: 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems, Piscataway, NJ: IEEE (2013) pp. 2100–2106.Google Scholar

Klein, G. and Murray, D., “Parallel tracking and mapping for small ar workspaces,” In: 2007, 6th IEEE and ACM International Symposium on Mixed and Augmented Reality, Piscataway, NJ: IEEE (2007) pp. 225–234.Google Scholar

Lee, H.-C., Lee, S.-H., Choi, M. H. and Lee, B.-H., “Probabilistic map merging for multi-robot RBPF-SLAM with unknown initial poses,” Robotica 30(2), 205–220 (2012).Google Scholar

Li, A., Boyd, A., Smyth, P. and Mandt, S., “Detecting and adapting to irregular distribution shifts in Bayesian online learning,” Adv. Neural Inf. Process. Syst. 34, 6816–6828 (2021).Google Scholar

Li, S., Gu, J., Li, Z., Li, S., Guo, B., Gao, S., Zhao, F., Yang, Y., Li, G. and Dong, L., “A visual SLAM-based lightweight multi-modal semantic framework for an intelligent substation robot,” Robotica 42(7), 2169–2183 (2024).Google Scholar

Loo, S. Y., Amiri, A. J., Mashohor, S., Tang, S. H. and Zhang, H.,“CNN-SVO: Improving the mapping in semi-direct visual odometry using single-image depth prediction,” In: 2019 International Conference on Robotics and Automation (ICRA), Piscataway, NJ: IEEE (2019) pp. 5218–5223.Google Scholar

Mur-Artal, R., Montiel, J. M. M. and Tardos, J. D., “ORB-SLAM: A versatile and accurate monocular SLAM system,” IEEE Trans. Robot. 31(5), 1147–1163 (2015).Google Scholar

Mur-Artal, R. and Tardós, J. D., “ORB-SLAM2: An open-source SLAM system for monocular, stereo, and RGB-D cameras,” IEEE Trans. Robot. 33(5), 1255–1262 (2017).CrossRef Google Scholar

Nie, Y., De Santis, L., Carratù, M., O’Nils, M., Sommella, P. and Lundgren, J., “Deep melanoma classification with k-fold cross-validation for process optimization,” In: 2020 IEEE international symposium on medical measurements and applications (MeMeA), IEEE (2020) pp. 1–6.Google Scholar

Qin, T., Li, P. and Shen, S., “VINS-Mono: A robust and versatile monocular visual-inertial state estimator,” IEEE Trans. Robot. 34(4), 1004–1020 (2018).Google Scholar

Rosinol, A., Leonard, J. J. and Carlone, L., “Probabilistic volumetric fusion for dense monocular slam,” In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Piscataway, NJ: IEEE (2023) pp. 3097–3105.Google Scholar

Schöps, T., Schönberger, J. L., Galliani, S., Sattler, T., Schindler, K., Pollefeys, M. and Geiger, A., “A multi-view stereo benchmark with high-resolution images and multi-camera videos,” In: Conference on Computer Vision and Pattern Recognition (CVPR), Piscataway, NJ: IEEE (2017) pp. 3260–3269.Google Scholar

Segu, M., Tonioni, A. and Tombari, F., “Batch normalization embeddings for deep domain generalization,” Pattern Recognit. 135, 109115 (2023).CrossRef Google Scholar

Sheng, W., Lu, X. and Li, X., “Data augmentation by separating identity and emotion representations for emotional gait recognition,” Robotica 41(5), 1452–1465 (2023).Google Scholar

Silberman, N., Hoiem, D., Kohli, P. and Fergus, R., “Indoor segmentation and support inference from RGBD images, In: “Computer Vision–ECCV 2012: 12th European Conference on Computer Vision,” Springer, Florence, Italy (2012) pp. 746–760. October 7--13, 2012, Proceedings, Part V 12 2012.Google Scholar

Sturm, J., Engelhard, N., Endres, F., Burgard, W. and Cremers, D., “A benchmark for the evaluation of RGB-D SLAM systems,” In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, Piscataway, NJ: IEEE (2012) pp. 573–580.Google Scholar

Tang, C. and Tan, P., “BA-Net: Dense bundle adjustment network,” (2018) arXiv preprint arXiv: 1806.04807 Google Scholar

Taylor, L. and Nitschke, G., “Improving deep learning with generic data augmentation,” In: 2018 IEEE Symposium Series on Computational Intelligence (SSCI), Piscataway, NJ: IEEE (2018) pp. 1542–1547.Google Scholar

Teed, Z. and Deng, J., “Deepv2d: Video to depth with differentiable structure from motion,” (2018) arXiv preprint arXiv: 1812.04605 Google Scholar

Teed, Z. and Deng, J., “RAFT: Recurrent all-pairs field transforms for optical flow,” In: Computer Vision–ECCV 2020: 16th European Conference, Springer, Glasgow, UK (2020) pp. 402–419. August 23--28, 2020, Proceedings, Part II 16 2020.Google Scholar

Teed, Z. and Deng, J., “DROID-SLAM: Deep visual SLAM for monocular, stereo, and RGB-D cameras,” Adv. Neural Inf. Process. Syst. 34, 16558–16569 (2021).Google Scholar

Wang, H., Zhang, C., Song, Y., Pang, B. and Zhang, G., “Three-dimensional reconstruction based on visual SLAM of mobile robot in search and rescue disaster scenarios,” Robotica 38(2), 350–373 (2020).CrossRef Google Scholar

Wang, W., Hu, Y. and Scherer, S., “TartanVO: A generalizable learning-based VO,” In: Conference on Robot Learning, PMLR (2021) pp. 1761–1772.Google Scholar

Wang, W., Zhu, D., Wang, X., Hu, Y., Qiu, Y., Wang, C., Hu, Y., Kapoor, A. and Scherer, S., “TartanAir: A dataset to push the limits of visual SLAM,” In: 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Piscataway, NJ: IEEE (2020 b) pp. 4909–4916.CrossRef Google Scholar

Wang, Y., Zhao, L., Gong, L., Chen, X. and Zuo, S., “A monocular SLAM system based on SIFT features for gastroscope tracking,” Med. Biol. Eng. Comput. 61(2), 511–523 (2023).CrossRef Google Scholar PubMed

Whelan, T., Leutenegger, S., Salas-Moreno, R. F., Glocker, B. and Davison, A. J., “ElasticFusion: Dense SLAM without a pose graph,” Robotics: Science and Systems, Rome, Italy: RSS Foundation (2015) pp. 3. volume 11.Google Scholar

Xiao, L., Wang, J., Qiu, X., Rong, Z. and Zou, X., “Dynamic-SLAM: Semantic monocular visual localization and mapping based on deep learning in dynamic environment,” Robot. Auton. Syst. 117, 1–16 (2019).CrossRef Google Scholar

Xiao, M., Li, Y., Yan, X., Gao, M. and Wang, W., “Convolutional neural network classification of cancer cytopathology images: taking breast cancer as an example,” In: Proceedings of the 2024 7th International Conference on Machine Vision and Applications, New York, NY: ACM (2024) pp. 145–149.Google Scholar

Yang, N., v. Stumberg, L., Wang, R. and Cremers, D., “D3VO: Deep depth, deep pose and deep uncertainty for monocular visual odometry,” In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Piscataway, NJ: IEEE (2020) pp. 1281–1292.Google Scholar

Yoshida, Y. and Miyato, T., “Spectral norm regularization for improving the generalizability of deep learning,” (2017) arXiv preprint arXiv:1705.10941 Google Scholar

Yu, X., Zheng, W. and Ou, L., “CPR-SLAM: RGB-D SLAM in dynamic environment using sub-point cloud correlations,” Robotica 42(7), 2367–2387 (2024).Google Scholar

Zhang, K., Dong, C., Guo, H., Ye, Q., Gao, L., Xiang, S., Chen, X. and Wu, Y., “A semantic visual SLAM based on improved mask R-CNN in dynamic environment,” Robotica 42(10), 3570–3591 (2024).Google Scholar

Zhong, Q. and Fang, X., “A BigBiGAN-based loop closure detection algorithm for indoor visual SLAM,” J. Electr. Comput. Eng. 2021(1), 10 (2021).Google Scholar

Zhou, T., Brown, M., Snavely, N. and Lowe, D. G., “Unsupervised learning of depth and ego-motion from video,” In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Piscataway, NJ: IEEE (2017) pp. 1851–1858.Google Scholar

Zubizarreta, J., Aguinaga, I. and Montiel, J. M. M., “Direct sparse mapping,” IEEE Trans. Robot. 36(4), 1363–1370 (2020).CrossRef Google Scholar

Article contents

Improve generalization for neural visual-SLAM with Bayes online learning

Abstract

Keywords

Access options

Article purchase

Temporarily unavailable

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests