To address the issues of low positioning accuracy and weak robustness of prior visual simultaneous localization and mapping (VSLAM) systems in dynamic environments, a semantic VSLAM (Sem-VSLAM) approach based on deep learning is proposed in this article. The proposed Sem-VSLAM algorithm adds semantic segmentation threads in parallel based on the open-source ORB-SLAM2’s visual odometry. First, while extracting the ORB features from an RGB-D image, the frame image is semantically segmented, and the segmented results are detected and repaired. Then, the feature points of dynamic objects are eliminated by using semantic information and motion consistency detection, and the poses are estimated by using the remaining feature points after the dynamic feature elimination. Finally, a 3D point cloud map is constructed by using tracking information and semantic information. The experiment uses Technical University of Munich public data to show the usefulness of the Sem-VSLAM algorithm. The experimental results show that the Sem-VSLAM algorithm can reduce the absolute trajectory error and relative attitude error of attitude estimation by about 95% compared to the ORB-SLAM2 algorithm and by about 14% compared to the VO-YOLOv5s in a highly dynamic environment and the average time consumption of tracking each frame image reaches 61 ms. It is verified that the Sem-VSLAM algorithm effectively improves the robustness and positioning accuracy in high dynamic environment and owning a satisfying real-time performance. Therefore, the Sem-VSLAM has a better mapping effect in a highly dynamic environment.