ADM-YOLOv11:动态自适应铁塔视频多尺度目标检测算法

doi:10.13474/j.cnki.11-2246.2025.0912

摘要/Abstract

摘要： 针对铁塔视频大视角动态场景下多尺度目标检测精度不足、小目标识别率低下等技术瓶颈,本文提出一种基于多模块协同优化的ADM-YOLOv11目标检测算法。首先,在Backbone网络中嵌入自适应特征增强模块(AFE),对C3K2模块进行深度重构,通过空间上下文感知与特征精细化机制,显著提升网络对复杂场景的特征提取能力;其次,将高效多尺度注意力模块(EMA)集成至Neck的C3K2模块,增强模型对多尺度目标的检测稳健性;然后,在Neck结构中引入超轻量动态上采样器DySample,以替代传统上采样层,优化多尺度特征的细节表达与语义融合效率;最后,采用EMASlideLoss分类损失函数,通过动态加权策略抑制数据不均衡导致的梯度偏移,有效提升模型泛化性能。试验结果表明,本文模型使mAP_50-95从基准模型的74.8%提升至82.6%,mAP₅₀提升至96.6%,ADM-YOLOv11显著提升了铁塔视频动态场景中多尺度目标检测精度。

关键词: 铁塔视频, 多尺度, YOLOv11, 自适应特征增强, 动态上采样

Abstract: Aiming at the technical bottlenecks such as the insufficient detection accuracy of multi-scale targets and the low recognition rate of small targets in the large-view dynamic scenes of tower videos,this paper proposes a ADM-YOLOv11 target detection algorithm based on the collaborative optimization of multiple modules.Firstly,the adaptive feature enhancement(AFE)module is embedded in the Backbone network to deeply reconstruct the C3K2 module.Through the spatial context awareness and feature refinement mechanisms,the feature extraction ability of the network for complex scenes is significantly improved.Secondly,the efficient multi-scale attention (EMA)module is integrated into the C3K2 module of the Neck to enhance the detection robustness of the model for multi-scale targets.Thirdly,the ultra-lightweight dynamic upsampler DySample (dynamic upsample)is introduced into the Neck structure to replace the traditional upsampling layer,optimizing the detail expression and semantic fusion efficiency of multi-scale features.Finally,the EMASlideLoss classification loss function is adopted.By using a dynamic weighting strategy,the problem of gradient shift caused by data imbalance is suppressed,effectively improving the generalization performance of the model.The experimental results show that for the model in this study,the mAP_50-95 is increased from 74.8%of the baseline model to 82.6%,and the mAP₅₀ is increased to 96.6%.The ADM-YOLOv11 significantly improves the detection accuracy of multi-scale targets in the dynamic scenes of tower videos.

Key words: tower video, multi-scale, YOLOv11, adaptive feature enhancement, dynamic upsampling

中图分类号:

P237

唐芝青, 张涛, 王培玉, 向导, 刘海飞, 刘仁峰, 贺江江. ADM-YOLOv11:动态自适应铁塔视频多尺度目标检测算法[J]. 测绘通报, 2025, 0(9): 70-77.

TANG Zhiqing, ZHANG Tao, WANG Peiyu, XIANG Dao, LIU Haifei, LIU Renfeng, HE Jiangjiang. ADM-YOLOv11:dynamic adaptive multi-scale object detection algorithm for tower video[J]. Bulletin of Surveying and Mapping, 2025, 0(9): 70-77.

参考文献

[1] 文志军, 魏鹏程, 张之政, 等.一种正射遥感影像辅助的铁塔视频地理定位方法[J].测绘通报, 2025(2):113-117.
[2] 杨国柱, 孙诗睿, 田茂杰, 等.基于高分影像和改进YOLOv7模型在输电线路走廊的建筑物识别[J].测绘通报, 2025(4):82-89.
[3] 蓝贵文, 徐梓睿, 任新月, 等.基于YOLOv8n改进的航拍输电线路图像多类电力部件检测算法[J].测绘通报, 2024(9):38-43.
[4] 肖海林, 田波, 胡彬, 等.基于信道先验多尺度跨轴注意YOLO的无人机视角下多尺度小目标检测算法[J/OL].计算机应用:1-10[2025-07-18].https://kns-cnki-net.webvpn.usst.edu.cn/kcms/detail/51.1307.tp.20250413.0157.002.html.
[5] 秦乐, 谭泽富, 雷国平, 等.EMF-YOLO:轻量化多尺度特征提取路面缺陷检测算法[J/OL].计算机工程与应用:1-12[2025-07-18].https://kns-cnkj-net.webvpn.usst.deu.cn/kcms/detail/11.2127.tp.20250321.1618.010.html.
[6] 陈林豪, 李广明, 申京傲, 等.改进YOLOv8的航拍图像森林火灾检测算法[J].东莞理工学院学报, 2025, 32(1):48-56.
[7] 贺智轩, 陈里里, 王翔, 等.DMF-YOLOv11:基于改进YOLOv11n的无人机航拍图像目标检测算法[J/OL].计算机工程与应用:1-14[2025-07-18].https://kns-cnkj-net.webvpn.usst.edu.cn/kcms/detail/11.2127.tp.20250405.2136.016.html.
[8] KHANAM R, HUSSAIN M.Yolov11:an overview of the key architectural enhancements[J].Computer Vision and Pattern Recognition, 2024:2410.17725.
[9] 徐彦威, 李军, 董元方, 等.YOLO系列目标检测算法综述[J].计算机科学与探索, 2024, 18(9):2221-2238.
[10] SOHAN M, SAI RAM T, RAMI REDDY C V.A review on YOLOv8 and its advancements[M]//Data Intelligence and Cognitive Informatics.Singapore:Springer Nature Singapore, 2024:529-545.
[11] ALI M, JAVAID M, NOMAN M, et al.Fanet:feature amplification network for semantic segmentation in cluttered background[C]//Proceedings of 2024 IEEE International Conference on Image Processing (ICIP).Abu Dhabi:IEEE, 2024:2592-2598.
[12] OUYANG Daliang, HE Su, ZHANG Guozhong, et al.Efficient multi-scale attention module with cross-spatial learning[C]//Proceedings of 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).Rhodes Island:IEEE, 2023:1-5.
[13] LIU Wenze, LU Hao, FU Hongtao, et al.Learning to upsample by learning to sample[C]//Proceedings of 2023 IEEE/CVF International Conference on Computer Vision (ICCV).Paris:IEEE, 2023:6004-6014.
[14] BA J L, KIROS J R, HINTON G E.Layer normalization[J].Machine Learning, 2016:1607.06450.
[15] CHOLLET F.Xception:deep learning with depthwise separable convolutions[C]//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Honolulu:IEEE, 2017:1800-1807.
[16] LI Jiachen, HASSANI A, WALTON S, et al.ConvMLP:hierarchical convolutional MLPs for vision[C]//Proceedings of 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).Vancouver, BC:IEEE, 2023:6307-6316.