测绘通报 ›› 2026, Vol. 0 ›› Issue (4): 73-80.doi: 10.13474/j.cnki.11-2246.2026.0411

• 学术研究 • 上一篇    下一篇

基于YOLOv8-CPCA的跨模态水下沉船识别算法

孙颢桉, 王朝莹, 王语   

  1. 信息工程大学地理空间信息学院, 河南 郑州 450001
  • 收稿日期:2025-09-15 发布日期:2026-05-12
  • 通讯作者: 王朝莹。E-mail:xdyy121@163.com
  • 作者简介:孙颢桉(2004—),女,研究方向为海洋测绘。E-mail:511886801@qq.com
  • 基金资助:
    国家自然科学基金(42574011);重点实验室领域基金(2025-JCJQ-JJ-0286)

Cross-modal underwater shipwreck recognition algorithm based on YOLOv8-CPCA

SUN Hao'an, WANG Zhaoying, WANG Yu   

  1. School of Geospatial Information, Information Engineering University, Zhengzhou 450001, China
  • Received:2025-09-15 Published:2026-05-12

摘要: 本文针对水下目标识别中声学数据受噪声影响大、易受干扰,光学数据获取难度随深度增加而显著上升的现象,提出将声学图像与光学图像融合的方法,以提高水下目标识别的精度。由于水下对应声学图像与光学图像数据集稀缺,使用Cycle-GAN网络进行数据集样本扩增,并对生成的数据集进行图像增强处理。在目标识别算法领域,将Transformer跨模态注意力模块和通道先验卷积注意力机制引入YOLOv8算法。使用侧扫声呐沉船目标数据集试验的结果表明,跨模态声光融合的目标识别算法相比基于单一声学或光学数据的目标识别算法而言,均值平均准确率分别提高了0.175和0.165。在目标识别算法中,构建的优化主干网络能够实现声学和光学特征的跨模态整合,提升了特征提取效率,解决了水下沉船边缘与周边环境难以区分的难题。

关键词: 目标识别, 声光融合, Cycle-GAN, 跨模态注意力, YOLO算法, 融合算法, Transformer

Abstract: In underwater target recognition,acoustic data is greatly affected by noise and susceptible to interference,while the difficulty of obtaining optical data increases significantly with depth.This paper proposes a method of fusing acoustic and optical images to improve the accuracy of underwater target recognition.To address the scarcity of corresponding acoustic and optical image datasets for underwater targets,a Cycle-GAN network is employed for dataset sample augmentation,followed by image enhancement processing on the generated dataset.In the field of target recognition algorithms,the Transformer cross-modal attention module and the channel prior convolutional attention mechanism are integrated into the YOLOv8 algorithm to improve target recognition accuracy and precision.This study utilizes a sonar-scanned shipwreck target dataset.Experimental results indicate that to the cross-modal acoustic-optical fusion target in target recognition algorithms,the average and average accuracy rates have respectively improved 0.175 and 0.165. Constructed optimized backbone network enables cross-modal integration of acoustic and optical features,enhancing the efficiency of feature extraction and addressing the challenge of distinguishing the edges of submerged shipwrecks from their surrounding environments.

Key words: target recognition, acoustic-optical fusion, Cycle-GAN, cross-modal attention, YOLO algorithm, fusion algorithm, Transformer

中图分类号: