测绘通报 ›› 2026, Vol. 0 ›› Issue (5): 103-109.doi: 10.13474/j.cnki.11-2246.2026.0517

• 学术研究 • 上一篇    

基于视觉状态空间模型的城市遥感影像语义分割方法

陈冲1, 杨扬2   

  1. 1. 湖南建筑高级技工学校, 湖南 长沙 410015;
    2. 山西省测绘地理信息院, 山西 太原 030001
  • 收稿日期:2025-12-26 发布日期:2026-06-09
  • 作者简介:陈冲(1986—),男,讲师,主要研究方向为工程测量、大地测量及遥感影像处理。E-mail:33706138@qq.com
  • 基金资助:
    湖南省技工教育教学研究重点课题(jykt202402)

A semantic segmentation method for urban remote sensing images based on visual state space models

CHEN Chong1, YANG Yang2   

  1. 1. Hunan Construction Technical College, Changsha 410015, China;
    2. Shanxi Institute of Surveying and Mapping Geographic Information, Taiyuan 030001, China
  • Received:2025-12-26 Published:2026-06-09

摘要: [目的] 针对复杂城市场景遥感影像地物尺度差异大、边界模糊与类别混淆等问题,本文提出了一种基于视觉状态空间模型的语义分割方法。[方法] 设计双分支协同编码器融合全局上下文与局部多尺度特征,并引入跨分支协同机制实现动态交互;解码端采用状态空间驱动的逐级解码策略恢复高分辨率语义。[结果] 在长沙市典型城市场景影像上,OA、mIoU、mF1分别为91.04%、73.46%、84.18%,较RS3Mamba分别提升0.93、1.08、0.91个百分点;道路与建筑物等结构性类别表现更稳定。[结论] 本文方法可有效提升复杂城市场景语义分割精度与稳健性,为高分辨率遥感影像精细解译提供可行技术路径。

关键词: 遥感影像, 语义分割, 视觉状态空间模型, 城市场景, 深度学习

Abstract: [Purposes] To address the problems of large-scale variation,blurred boundaries,and category confusion in complex urban remote sensing images,a semantic segmentation method based on visual state space models is proposed.[Methods] A dual-branch collaborative encoder is designed to integrate global contextual information and local multi-scale features,and a cross-branch collaboration mechanism is introduced for dynamic feature interaction.A state space-driven progressive decoding strategy is employed to restore high-resolution semantic representations.[Findings] Experiments on typical urban remote sensing images of Changsha show that the proposed method achieves an overall accuracy (OA)of 91.04%,a mean intersection over union (mIoU)of 73.46%,and a mean F1-score (mF1)of 84.18%,outperforming RS3Mamba by 0.93,1.08,and 0.91 percentage points,respectively.More stable performance is observed for structural classes such as roads and buildings.[Conclusions] The results demonstrate that the proposed method effectively improves segmentation accuracy and robustness in complex urban scenes,providing a feasible technical approach for fine interpretation of high-resolution remote sensing images.

Key words: remote sensing image, semantic segmentation, visual state space model, urban scene, deep learning

中图分类号: