Bulletin of Surveying and Mapping ›› 2025, Vol. 0 ›› Issue (9): 70-77.doi: 10.13474/j.cnki.11-2246.2025.0912

Previous Articles     Next Articles

ADM-YOLOv11:dynamic adaptive multi-scale object detection algorithm for tower video

TANG Zhiqing1, ZHANG Tao1, WANG Peiyu2, XIANG Dao1, LIU Haifei1, LIU Renfeng1, HE Jiangjiang1   

  1. 1. Hunan Second Institute of Surveying and Mapping, Changsha 410000, China;
    2. Xiangtan University, Xiangtan 411105, China
  • Received:2025-05-09 Published:2025-09-29

Abstract: Aiming at the technical bottlenecks such as the insufficient detection accuracy of multi-scale targets and the low recognition rate of small targets in the large-view dynamic scenes of tower videos,this paper proposes a ADM-YOLOv11 target detection algorithm based on the collaborative optimization of multiple modules.Firstly,the adaptive feature enhancement(AFE)module is embedded in the Backbone network to deeply reconstruct the C3K2 module.Through the spatial context awareness and feature refinement mechanisms,the feature extraction ability of the network for complex scenes is significantly improved.Secondly,the efficient multi-scale attention (EMA)module is integrated into the C3K2 module of the Neck to enhance the detection robustness of the model for multi-scale targets.Thirdly,the ultra-lightweight dynamic upsampler DySample (dynamic upsample)is introduced into the Neck structure to replace the traditional upsampling layer,optimizing the detail expression and semantic fusion efficiency of multi-scale features.Finally,the EMASlideLoss classification loss function is adopted.By using a dynamic weighting strategy,the problem of gradient shift caused by data imbalance is suppressed,effectively improving the generalization performance of the model.The experimental results show that for the model in this study,the mAP50-95 is increased from 74.8%of the baseline model to 82.6%,and the mAP50 is increased to 96.6%.The ADM-YOLOv11 significantly improves the detection accuracy of multi-scale targets in the dynamic scenes of tower videos.

Key words: tower video, multi-scale, YOLOv11, adaptive feature enhancement, dynamic upsampling

CLC Number: