基于视觉状态空间模型的城市遥感影像语义分割方法

doi:10.13474/j.cnki.11-2246.2026.0517

摘要/Abstract

摘要： [目的] 针对复杂城市场景遥感影像地物尺度差异大、边界模糊与类别混淆等问题,本文提出了一种基于视觉状态空间模型的语义分割方法。[方法] 设计双分支协同编码器融合全局上下文与局部多尺度特征,并引入跨分支协同机制实现动态交互;解码端采用状态空间驱动的逐级解码策略恢复高分辨率语义。[结果] 在长沙市典型城市场景影像上,OA、mIoU、mF1分别为91.04%、73.46%、84.18%,较RS3Mamba分别提升0.93、1.08、0.91个百分点;道路与建筑物等结构性类别表现更稳定。[结论] 本文方法可有效提升复杂城市场景语义分割精度与稳健性,为高分辨率遥感影像精细解译提供可行技术路径。

关键词: 遥感影像, 语义分割, 视觉状态空间模型, 城市场景, 深度学习

Abstract: [Purposes] To address the problems of large-scale variation,blurred boundaries,and category confusion in complex urban remote sensing images,a semantic segmentation method based on visual state space models is proposed.[Methods] A dual-branch collaborative encoder is designed to integrate global contextual information and local multi-scale features,and a cross-branch collaboration mechanism is introduced for dynamic feature interaction.A state space-driven progressive decoding strategy is employed to restore high-resolution semantic representations.[Findings] Experiments on typical urban remote sensing images of Changsha show that the proposed method achieves an overall accuracy (OA)of 91.04%,a mean intersection over union (mIoU)of 73.46%,and a mean F1-score (mF1)of 84.18%,outperforming RS3Mamba by 0.93,1.08,and 0.91 percentage points,respectively.More stable performance is observed for structural classes such as roads and buildings.[Conclusions] The results demonstrate that the proposed method effectively improves segmentation accuracy and robustness in complex urban scenes,providing a feasible technical approach for fine interpretation of high-resolution remote sensing images.

Key words: remote sensing image, semantic segmentation, visual state space model, urban scene, deep learning

中图分类号:

P237

陈冲, 杨扬. 基于视觉状态空间模型的城市遥感影像语义分割方法[J]. 测绘通报, 2026, 0(5): 103-109.

CHEN Chong, YANG Yang. A semantic segmentation method for urban remote sensing images based on visual state space models[J]. Bulletin of Surveying and Mapping, 2026, 0(5): 103-109.

参考文献

[1] ZHU Xiaoxiang,TUIA D,MOU Lichao,et al.Deep learning in remote sensing:a comprehensive review and list of resources[J].IEEE Geoscience and Remote Sensing Magazine,2017,5(4):8-36.
[2] 李道纪,郭海涛,卢俊,等.遥感影像地物分类多注意力融和U型网络法[J].测绘学报,2020,49(8):1051-1064.
[3] 范荣双,陈洋,徐启恒,等.基于深度学习的高分辨率遥感影像建筑物提取方法[J].测绘学报,2019,48(1):34-41.
[4] RONNEBERGER O,FISCHER P,BROX T.U-Net:convolutional networks for biomedical image segmentation[C]//Proceedings of 2015 Medical Image Computing and Computer-Assisted Intervention.Cham:Springer,2015:234-241.
[5] DIAKOGIANNIS F I,WALDNER F,CACCETTA P,et al.ResUNet-a:a deep learning framework for semantic segmentation of remotely sensed data[J].ISPRS Journal of Photogrammetry and Remote Sensing,2020,162:94-114.
[6] CHEN L C,ZHU Yukun,PAPANDREOU G,et al.Encoder-decoder with atrous separable convolution for semantic image segmentation[C]//Proceedings of 2018 European Conference on Computer Vision.Cham:Springer,2018:833-851.
[7] ZHAO H,SHI J,QI X,et al.Pyramid scene parsing network[C]//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition.Honolulu,Hawaii:[s.n.],2017.
[8] XIE E,WANG W,YU Z,et al.SegFormer:simple and efficient design for semantic segmentation with transformers[C]//Proceedings of 2021 NeurIPS.[S.l.]:Curran Associates,2021.
[9] XU Zhiyong,ZHANG Weicun,ZHANG Tianxiang,et al.Efficient transformer for remote sensing image segmentation[J].Remote Sensing,2021,13(18):3585.
[10] WANG Libo,LI Rui,ZHANG Ce,et al.UNetFormer:a UNet-like transformer for efficient semantic segmentation of remote sensing urban scene imagery[J].ISPRS Journal of Photogrammetry and Remote Sensing,2022,190:196-214.
[11] ZHU Qinfeng,CAI Yuanzhi,FANG Yuan,et al.Samba:Semantic segmentation of remotely sensed images with state space model[J].Heliyon,2024,10(19):e38495.
[12] LIU Y,TIAN Y,ZHAO Y,et al.VMamba:visual state space model[C]//Proceedings of 2024 Advances in Neural Information Processing Systems.Vancouver:[s.n.],2024.
[13] MA Xianping,ZHANG Xiaokang,PUN M O.RS3Mamba:visual state space model for remote sensing image semantic segmentation[J].IEEE Geoscience and Remote Sensing Letters,2024,21:6011405.
[14] ZHU Enze,CHEN Zhan,WANG Dingkai,et al.UNetMamba:an efficient UNet-like mamba for semantic segmentation of high-resolution remote sensing images[J].IEEE Geoscience and Remote Sensing Letters,2025,22:6001205.
[15] CHEN L C,YANG Yi,WANG Jiang,et al.Attention to scale:scale-aware semantic image segmentation[C]//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition.Las Vegas:IEEE,2016:3640-3649.
[16] HUANG Zilong,WANG Xinggang,HUANG Lichao,et al.CCNet:criss-cross attention for semantic segmentation[C]//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision.Seoul:IEEE,2020:603-612.
[17] LIU Xiao,WANG Tao,JIN Fei,et al.Multimodal cross fusion Mamba network for remote sensing image semantic segmentation with complementary masked self-supervision[J].International Journal of Applied Earth Observation and Geoinformation,2025,145:104960.
[18] HU Jie,SHEN Li,SUN Gang.Squeeze-and-excitation networks[C]//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.Salt Lake City:IEEE,2018:7132-7141.