Hostname: page-component-5cf477f64f-xc2pj Total loading time: 0 Render date: 2025-04-02T07:38:10.910Z Has data issue: false hasContentIssue false

Improve generalization for neural visual-SLAM with Bayes online learning

Published online by Cambridge University Press:  31 March 2025

Jun Liu
Affiliation:
School of Physics and Electronic Engineering, Northeast Petroleum University, Daqing, Peoples R China
Haihang Deng*
Affiliation:
School of Physics and Electronic Engineering, Northeast Petroleum University, Daqing, Peoples R China
*
Corresponding author: Haihang Deng; Email: [email protected]

Abstract

Among various deep learning-based SLAM systems, many exhibit low accuracy and inadequate generalization on non-training datasets. The deficiency in generalization ability can result in significant errors within SLAM systems during real-world applications, particularly in environments that diverge markedly from those represented in the training set. This paper presents a methodology to enhance the generalization capabilities of deep learning SLAM systems. It leverages their superior performance in feature extraction and introduces Exponential Moving Average (EMA) and Bayes online learning to improve generalization and localization accuracy in varied scenarios. Experimental validation, utilizing Absolute Trajectory Error (ATE) metrics on the dataset, has been conducted. The results demonstrate that this method effectively reduces errors by $20\%$ on the EuRoC dataset and by $35\%$ on the TUM dataset, respectively.

Type
Research Article
Copyright
© The Author(s), 2025. Published by Cambridge University Press

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Ada, S. E., Ugur, E. and Akin, H. L., “Generalization in transfer learning: Robust control of robot locomotion,” Robotica 40(11), 38113836 (2022).CrossRefGoogle Scholar
Ahmad, T., Ma, Y., Yahya, M., Ahmad, B., Nazir, S. and u. Haq, A., “Object detection through modified YOLO neural network,” Sci. Programming-NETH 2020(1), 8403262 (2020).Google Scholar
Bai, Y., Yang, E., Han, B., Yang, Y., Li, J., Mao, Y., Niu, G. and Liu, T., “Understanding and improving early stopping for learning with noisy labels,” Adv. Neural Inf. Process. Syst. 34, 2439224403 (2021).Google Scholar
Bjorck, N., Gomes, C. P., Selman, B. and Weinberger, K. Q., “Understanding batch normalization,” Adv. Neural Inf. Process. Syst. 31, 76947705 (2018).Google Scholar
Burri, M., Nikolic, J., Gohl, P., Schneider, T., Rehder, J., Omari, S., Achtelik, M. W. and Siegwart, R., “The EuRoC micro aerial vehicle datasets,” Int. J. Rob. Res. 35(10), 11571163 (2016).CrossRefGoogle Scholar
Cai, Z., Ravichandran, A., Maji, S., Fowlkes, C., Tu, Z. and Soatto, S., “Exponential moving average normalization for self-supervised and semi-supervised learning,” In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Piscataway, NJ: IEEE (2021) pp. 194203.Google Scholar
Campos, C., Elvira, R., Rodríguez, J. J. G., Montiel, J. M. and Tardós, J. D., “ORB-SLAM3: An accurate open-source library for visual, visual–inertial, and multimap SLAM,” IEEE Trans. Robotics 37(6), 18741890 (2021).Google Scholar
Cheein, F. A. A., “SLAM-based maneuverability strategy for unmanned car-like vehicles”, Robotica 31(6), 905921 (2013).Google Scholar
Czarnowski, J., Laidlow, T., Clark, R. and Davison, A. J., “DeepFactors: Real time probabilistic dense monocular SLAM,” IEEE Robot. Autom. Lett. 5(2), 721728 (2020).Google Scholar
Dai, A., Chang, A. X., Savva, M., Halber, M., Funkhouser, T. and Nießner, M., “Scannet: Richly-annotated 3D reconstructions of indoor scenes,” In: Proceedings of the IEEE conference on computer vision and pattern recognition, Piscataway, NJ: IEEE (2017) pp. 58285839.Google Scholar
Davison, A. J., Reid, I. D., Molton, N. D. and Stasse, O., “MonoSLAM: Real-time single camera SLAM,” IEEE Trans. Pattern Anal. Mach. Intell. 29(6), 10521067 (2007).Google ScholarPubMed
Deraz, A. A., Badawy, O., Elhosseini, M. A., Mostafa, M., Ali, H. A. and El-Desouky, A. I., “Deep learning based on LSTM model for enhanced visual odometry navigation system,” Ain Shams Eng. J. 14(8), 102050 (2023).CrossRefGoogle Scholar
Di Giammarino, L., Brizi, L., Guadagnino, T., Stachniss, C. and Grisetti, G., “MD-SLAM: Multi-cue direct SLAM,” In: 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Piscataway, NJ: IEEE (2022) pp. 1104711054.Google Scholar
Dusmanu, M., Schönberger, J. L. and Pollefeys, M., “Multi-view optimization of local feature geometry,” In: Computer Vision–ECCV 2020: 16th European Conference Part, Springer, Glasgow, UK (2020) pp. 670686. August 23--28, 2020, Proceedings, Part I 16 2020.Google Scholar
Engel, J., Koltun, V. and Cremers, D., “Direct sparse odometry,” IEEE Trans. Pattern Anal. Mach. Intell. 40(3), 611625 (2017).Google ScholarPubMed
Forster, C., Pizzoli, M. and Scaramuzza, D., “SVO: Fast semi-direct monocular visual odometry,” In: 2014 IEEE International Conference on Robotics and Automation (ICRA), Piscataway, NJ: IEEE (2014) pp. 1522.CrossRefGoogle Scholar
Ganaie, M. A., Hu, M., Malik, A. K., Tanveer, M. and Suganthan, P. N., “Ensemble deep learning: A review,” Eng. Appl. Artif. Intel. 115, 105151 (2022).Google Scholar
Gruevski, I. and Gaber, S., “Basic time series models in financial forecasting,” J. Econ. 6(1), 7689 (2021).Google Scholar
Jaouedi, N., Boujnah, N. and Bouhlel, M. S., “A new hybrid deep learning model for human action recognition,” J. King Saud Univ. Comput. Inf. Sci. 32(4), 447453 (2020).CrossRefGoogle Scholar
Johari, M. M., Carta, C. and Fleuret, F., “ESLAM: Efficient dense slam system based on hybrid representation of signed distance fields,” In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Piscataway, NJ: IEEE (2023) pp. 1740817419.Google Scholar
Kerl, C., Sturm, J. and Cremers, D., “Dense visual slam for RGB-D cameras,” In: 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems, Piscataway, NJ: IEEE (2013) pp. 21002106.Google Scholar
Klein, G. and Murray, D., “Parallel tracking and mapping for small ar workspaces,” In: 2007, 6th IEEE and ACM International Symposium on Mixed and Augmented Reality, Piscataway, NJ: IEEE (2007) pp. 225234.Google Scholar
Lee, H.-C., Lee, S.-H., Choi, M. H. and Lee, B.-H., “Probabilistic map merging for multi-robot RBPF-SLAM with unknown initial poses,” Robotica 30(2), 205220 (2012).Google Scholar
Li, A., Boyd, A., Smyth, P. and Mandt, S., “Detecting and adapting to irregular distribution shifts in Bayesian online learning,” Adv. Neural Inf. Process. Syst. 34, 68166828 (2021).Google Scholar
Li, S., Gu, J., Li, Z., Li, S., Guo, B., Gao, S., Zhao, F., Yang, Y., Li, G. and Dong, L., “A visual SLAM-based lightweight multi-modal semantic framework for an intelligent substation robot,” Robotica 42(7), 21692183 (2024).Google Scholar
Loo, S. Y., Amiri, A. J., Mashohor, S., Tang, S. H. and Zhang, H.,“CNN-SVO: Improving the mapping in semi-direct visual odometry using single-image depth prediction,” In: 2019 International Conference on Robotics and Automation (ICRA), Piscataway, NJ: IEEE (2019) pp. 52185223.Google Scholar
Mur-Artal, R., Montiel, J. M. M. and Tardos, J. D., “ORB-SLAM: A versatile and accurate monocular SLAM system,” IEEE Trans. Robot. 31(5), 11471163 (2015).Google Scholar
Mur-Artal, R. and Tardós, J. D., “ORB-SLAM2: An open-source SLAM system for monocular, stereo, and RGB-D cameras,” IEEE Trans. Robot. 33(5), 12551262 (2017).CrossRefGoogle Scholar
Nie, Y., De Santis, L., Carratù, M., O’Nils, M., Sommella, P. and Lundgren, J., “Deep melanoma classification with k-fold cross-validation for process optimization,” In: 2020 IEEE international symposium on medical measurements and applications (MeMeA), IEEE (2020) pp. 16.Google Scholar
Qin, T., Li, P. and Shen, S., “VINS-Mono: A robust and versatile monocular visual-inertial state estimator,” IEEE Trans. Robot. 34(4), 10041020 (2018).Google Scholar
Rosinol, A., Leonard, J. J. and Carlone, L., “Probabilistic volumetric fusion for dense monocular slam,” In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Piscataway, NJ: IEEE (2023) pp. 30973105.Google Scholar
Schöps, T., Schönberger, J. L., Galliani, S., Sattler, T., Schindler, K., Pollefeys, M. and Geiger, A., “A multi-view stereo benchmark with high-resolution images and multi-camera videos,” In: Conference on Computer Vision and Pattern Recognition (CVPR), Piscataway, NJ: IEEE (2017) pp. 32603269.Google Scholar
Segu, M., Tonioni, A. and Tombari, F., “Batch normalization embeddings for deep domain generalization,” Pattern Recognit. 135, 109115 (2023).CrossRefGoogle Scholar
Sheng, W., Lu, X. and Li, X., “Data augmentation by separating identity and emotion representations for emotional gait recognition,” Robotica 41(5), 14521465 (2023).Google Scholar
Silberman, N., Hoiem, D., Kohli, P. and Fergus, R., “Indoor segmentation and support inference from RGBD images, In:Computer Vision–ECCV 2012: 12th European Conference on Computer Vision,” Springer, Florence, Italy (2012) pp. 746760. October 7--13, 2012, Proceedings, Part V 12 2012.Google Scholar
Sturm, J., Engelhard, N., Endres, F., Burgard, W. and Cremers, D., “A benchmark for the evaluation of RGB-D SLAM systems,” In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, Piscataway, NJ: IEEE (2012) pp. 573580.Google Scholar
Tang, C. and Tan, P., “BA-Net: Dense bundle adjustment network,” (2018) arXiv preprint arXiv: 1806.04807 Google Scholar
Taylor, L. and Nitschke, G., “Improving deep learning with generic data augmentation,” In: 2018 IEEE Symposium Series on Computational Intelligence (SSCI), Piscataway, NJ: IEEE (2018) pp. 15421547.Google Scholar
Teed, Z. and Deng, J., “Deepv2d: Video to depth with differentiable structure from motion,” (2018) arXiv preprint arXiv: 1812.04605 Google Scholar
Teed, Z. and Deng, J., “RAFT: Recurrent all-pairs field transforms for optical flow,” In: Computer Vision–ECCV 2020: 16th European Conference, Springer, Glasgow, UK (2020) pp. 402419. August 23--28, 2020, Proceedings, Part II 16 2020.Google Scholar
Teed, Z. and Deng, J., “DROID-SLAM: Deep visual SLAM for monocular, stereo, and RGB-D cameras,” Adv. Neural Inf. Process. Syst. 34, 1655816569 (2021).Google Scholar
Wang, H., Zhang, C., Song, Y., Pang, B. and Zhang, G., “Three-dimensional reconstruction based on visual SLAM of mobile robot in search and rescue disaster scenarios,” Robotica 38(2), 350373 (2020).CrossRefGoogle Scholar
Wang, W., Hu, Y. and Scherer, S., “TartanVO: A generalizable learning-based VO,” In: Conference on Robot Learning, PMLR (2021) pp. 17611772.Google Scholar
Wang, W., Zhu, D., Wang, X., Hu, Y., Qiu, Y., Wang, C., Hu, Y., Kapoor, A. and Scherer, S., “TartanAir: A dataset to push the limits of visual SLAM,” In: 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Piscataway, NJIEEE (2020 b) pp. 49094916.CrossRefGoogle Scholar
Wang, Y., Zhao, L., Gong, L., Chen, X. and Zuo, S., “A monocular SLAM system based on SIFT features for gastroscope tracking,” Med. Biol. Eng. Comput. 61(2), 511523 (2023).CrossRefGoogle ScholarPubMed
Whelan, T., Leutenegger, S., Salas-Moreno, R. F., Glocker, B. and Davison, A. J., “ElasticFusion: Dense SLAM without a pose graph,” Robotics: Science and Systems, Rome, Italy: RSS Foundation (2015) pp. 3. volume 11.Google Scholar
Xiao, L., Wang, J., Qiu, X., Rong, Z. and Zou, X., “Dynamic-SLAM: Semantic monocular visual localization and mapping based on deep learning in dynamic environment,” Robot. Auton. Syst. 117, 116 (2019).CrossRefGoogle Scholar
Xiao, M., Li, Y., Yan, X., Gao, M. and Wang, W., “Convolutional neural network classification of cancer cytopathology images: taking breast cancer as an example,” In: Proceedings of the 2024 7th International Conference on Machine Vision and Applications, New York, NY: ACM (2024) pp. 145149.Google Scholar
Yang, N., v. Stumberg, L., Wang, R. and Cremers, D., “D3VO: Deep depth, deep pose and deep uncertainty for monocular visual odometry,” In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Piscataway, NJ: IEEE (2020) pp. 12811292.Google Scholar
Yoshida, Y. and Miyato, T., “Spectral norm regularization for improving the generalizability of deep learning,” (2017) arXiv preprint arXiv:1705.10941Google Scholar
Yu, X., Zheng, W. and Ou, L., “CPR-SLAM: RGB-D SLAM in dynamic environment using sub-point cloud correlations,” Robotica 42(7), 23672387 (2024).Google Scholar
Zhang, K., Dong, C., Guo, H., Ye, Q., Gao, L., Xiang, S., Chen, X. and Wu, Y., “A semantic visual SLAM based on improved mask R-CNN in dynamic environment,” Robotica 42(10), 35703591 (2024).Google Scholar
Zhong, Q. and Fang, X., “A BigBiGAN-based loop closure detection algorithm for indoor visual SLAM,” J. Electr. Comput. Eng. 2021(1), 10 (2021).Google Scholar
Zhou, T., Brown, M., Snavely, N. and Lowe, D. G., “Unsupervised learning of depth and ego-motion from video,” In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Piscataway, NJ: IEEE (2017) pp. 18511858.Google Scholar
Zubizarreta, J., Aguinaga, I. and Montiel, J. M. M., “Direct sparse mapping,” IEEE Trans. Robot. 36(4), 13631370 (2020).CrossRefGoogle Scholar