测绘通报 ›› 2025, Vol. 0 ›› Issue (4): 9-13.doi: 10.13474/j.cnki.11-2246.2025.0402

• 人工智能与视觉系统 • 上一篇    

室内动态场景下融合语义信息的视觉SLAM方法

王一哲1, 张瑞菊1,2,3, 王坚1, 谢欣睿1, 黄启承1   

  1. 1. 北京建筑大学测绘与城市空间信息学院, 北京 102616;
    2. 代表性建筑与古建筑数据库教育部工程中心, 北京 102616;
    3. 建筑遗产精细重构与健康监测重点实验室, 北京 102616
  • 收稿日期:2024-06-20 发布日期:2025-04-28
  • 通讯作者: 张瑞菊。E-mail:zhangruiju@bucea.edu.cn
  • 作者简介:王一哲(2001—),男,硕士生,主要从事视觉定位和深度学习的研究。E-mail:wangyizhe0619@163.com
  • 基金资助:
    国家自然科学基金(42274029;42171416);北京市自然科学基金(8222011)

A visual SLAM method for merging semantic information in indoor dynamic scenes

WANG Yizhe1, ZHANG Ruiju1,2,3, WANG Jian1, XIE Xinrui1, HUANG Qicheng1   

  1. 1. School of Geomatics and Urban Spatial Informatics, Beijing University of Civil Engineering and Architecture, Beijing 102616, China;
    2. Engineering Center of Representative Architecture and Ancient Architecture Database, Ministry of Education, Beijing 102616, China;
    3. Key Laboratory of Fine Reconstruction and Health Monitoring of Architectural Heritage, Beijing 102616, China
  • Received:2024-06-20 Published:2025-04-28

摘要: 视觉SLAM作为实现智能设备自主感知与导航的核心技术,在人工智能和机器人领域扮演着关键角色。然而,当场景包含移动物体时,传统视觉SLAM算法的稳定性和定位精度显著下降。为解决上述问题,本文提出了一种室内动态场景下融合语义信息的SLAM方案。该方法基于ORB-SLAM2框架,通过引入GCNv2网络进行深度特征提取,并利用YOLOv5进行语义分割,以识别动态物体。结合运动一致性分析,有效剔除了动态干扰,增强了算法的稳健性。通过对TUM标准数据集的测试,与原ORB-SLAM2相比,改进后的算法在室内动态环境下实现了显著提升,平均定位精度提高达55.75%。这一成果证明了所提方法的有效性,显著提升了SLAM系统在复杂动态环境下的性能。

关键词: 视觉SLAM, 语义信息, 特征提取, 动态场景

Abstract: Visual SLAM is a core technology for autonomous perception and navigation in intelligent devices, playing a crucial role in AI and robotics. However, traditional visual SLAM algorithms suffer significantly in stability and localization accuracy when scenes contain moving objects. To address this, this paper proposes a SLAM scheme that integrates semantic information for indoor dynamic scenarios. Based on ORB-SLAM2, it introduces the GCNv2 network for deep feature extraction and YOLOv5 for semantic segmentation to identify dynamic objects. Combined with motion consistency analysis, it effectively eliminates dynamic interference, enhancing robustness. Tests on the TUM standard dataset show the improved algorithm significantly outperforms the original ORB-SLAM2 in dynamic indoor environments, with an average positioning accuracy improvement of 55.75%. This result demonstrates the proposed method's effectiveness, significantly boosting SLAM system performance in complex dynamic environments.

Key words: visual SLAM, semantic information, feature extraction, dynamic scenes

中图分类号: