测绘通报 ›› 2020, Vol. 0 ›› Issue (1): 40-44.doi: 10.13474/j.cnki.11-2246.2020.0009

• 学术研究 • 上一篇    下一篇

基于图像语义分割的动态场景下的单目SLAM算法

盛超, 潘树国, 赵涛, 曽攀, 黄砺枭   

  1. 东南大学仪器科学与工程学院, 江苏 南京 210096
  • 收稿日期:2019-05-12 修回日期:2019-07-02 发布日期:2020-02-10
  • 通讯作者: 潘树国。E-mail:psg@seu.edu.cn E-mail:psg@seu.edu.cn
  • 作者简介:盛超(1996-),男,硕士生,主要从事深度学习和视觉定位方面的研究。E-mail:seushengchao@seu.edu.cn
  • 基金资助:
    江苏省测绘地理信息科研项目(JSCHKY201808);国家重点研发计划(2016YFB0502101);国家自然科学基金(41574026;41774027)

Monocular SLAM system in dynamic scenes based on image semantic segmentation

SHENG Chao, PAN Shuguo, ZHAO Tao, ZEN Pan, HUANG Lixiao   

  1. School of Instrument Science and Engineering, Southeast University, Nanjing 210096, China
  • Received:2019-05-12 Revised:2019-07-02 Published:2020-02-10

摘要: 在恢复场景信息和相机运动时,传统的SLAM算法是基于静态环境假设的。场景中的动态物体会降低算法的稳健性和最终的定位精度。本文提出将基于深度学习的图像语义分割技术与传统的视觉SLAM算法结合,以减少动态物体对定位结果的干扰。首先,构建有监督的卷积神经网络对输入图像中的动态物体进行分割,获得语义图像;然后,从原始图像中提取特征点,并根据语义图像剔除动态物体特征点,保留静态物体特征点;最后,利用静态物体特征点采用基于特征点的单目视觉SLAM算法对相机运动进行跟踪。在ApolloScape自动驾驶数据集上的试验表明,与传统方法相比,本文算法在动态场景中定位精度提升约17%。

关键词: 单目视觉SLAM, 动态物体, 卷积神经网络, 语义分割, 深度学习

Abstract: The traditional visual SLAM algorithm is based on the static environment assumption when recovering scene information and camera motion. The dynamic objects in the scene will reduce the robustness of the algorithm and the final positioning accuracy. In this paper, we propose to combine the image semantic segmentation based on deep learning method with the traditional visual SLAM to reduce the interference of dynamic objects on the positioning results. Firstly, a supervised convolutional neural network (CNN) is used to segment the dynamic objects in the input image to obtain the semantic image. Secondly, after extracting feature points from the original image, the feature points of dynamic objects are eliminated object feature points according to the semantic image, and the feature points of static objects are saved. Finally, the monocular SLAM method with points is used to track the camera motion based on the saved feature points. Experiments on the ApolloScape datasets show that compared with the traditional method, the proposed method improves the positioning accuracy in dynamic scenes by about 17%.

Key words: monocular visual SLAM, dynamic objects, CNN, semantic segmentation, deep learning method

中图分类号: