测绘通报 ›› 2024, Vol. 0 ›› Issue (11): 61-67.doi: 10.13474/j.cnki.11-2246.2024.1111

• 学术研究 • 上一篇    

基于YOLOv5改进的交通场景目标检测算法HCAM-YOLO

王志涛, 张瑞菊, 王坚, 赵佳星, 刘严涛   

  1. 北京建筑大学测绘与城市空间信息学院, 北京 102600
  • 收稿日期:2024-03-21 发布日期:2024-12-05
  • 通讯作者: 王坚,E-mail:wangjian@bucea.edu.cn
  • 作者简介:王志涛(1999-),男,硕士生,主要研究方向为图像处理和计算机视觉。E-mail:wangztmeteo@163.com
  • 基金资助:
    北京市自然科学基金(8222011);国家自然科学基金(42274029)

Improved object detection algorithm HCAM-YOLO in traffic scenes based on YOLOv5

WANG Zhitao, ZHANG Ruiju, WANG Jian, ZHAO Jiaxing, LIU Yantao   

  1. School of Geomatics and Urban Spatial Information, Beijing University of Civil Engineering and Architecture, Beijing 102600, China
  • Received:2024-03-21 Published:2024-12-05

摘要: 交通场景下快速、准确的目标检测对智慧交通管理、智能驾驶路径决策具有重要意义。由于交通环境的复杂多变性及目标特征的多样性和稀疏性,传统的目标检测模型往往面临检测精度不足、漏检与误检率高的问题。为了克服这些困难,本文提出一种基于HcPAN特征融合网络的YOLOv5目标检测模型HCAM-YOLO。该方法的核心是针对PAN网络特征融合过程中局部信息易丢失的问题,设计了一种混合卷积注意力机制(HCAM),用于特征融合网络中的多尺度信息增强提取。将HCAM模块嵌入PAN底层结构,以提升局部关键特征敏感性,强化深层语义信息与浅层位置信息的融合效果。该方法的创新性在于利用注意力机制优化特征融合过程,从而提升模型在复杂交通环境中对行人、机动车辆等目标的检测性能。试验数据集由Rope 3D数据集与自制车流数据集Road Veh、自制行人数据集Road Ped联合组成。试验结果表明,HCAM模块比其他注意力机制更适合嵌入PAN底层网络中,最终算法模型HCAM-YOLO相比于基础模型YOLOv5,精确率和召回率分别提高3.4%和3.2%;平均精度均值提高3.8%。本文算法模型对复杂背景下交通场景目标检测任务具有良好的适应性。

关键词: 目标检测, 交通场景, 特征增强, 注意力机制, YOLOv5

Abstract: The rapid and precise detection of targets in traffic scenarios is crucial for intelligent traffic management and driving path decision-making. Traditional target detection models often grapple with issues such as inadequate detection accuracy, high rates of leak detection and false detection due to the complexity and variability of the traffic environment, and the diversity and sparsity of target features. To address these challenges, this paper introduces a YOLOv5 target detection model, HCAM-YOLO, which leverages the HcPAN feature fusion network. The crux of this approach lies in addressing the issue of local information being easily lost during the PAN network's feature fusion process. A hybrid convolutional attention mechanism(HCAM) is designed to enhance multi-scale information extraction in feature fusion networks. By integrating the HCAM module into the PAN's underlying structure, the sensitivity of key local features is enhanced, while the fusion effect of deep semantic information and shallow positional data is strengthened. This method's novelty lies in its use of an attention mechanism to optimize the feature fusion process, thereby improving the model's detection performance of pedestrians, motor vehicles, and other targets in complex traffic environments. The experimental dataset comprises the Rope 3D dataset, Road Veh dataset, and Road Ped dataset. The results demonstrate that the HCAM module is more suitable for integration into the underlying PAN network than other attention mechanisms. When compared to the basic YOLOv5 model, the precision and recall of the final HCAM-YOLO algorithm model increased by 3.4% and 3.2%,respectively, and mAP@0.5/% by 3.8%. The HCAM-YOLO algorithm model proposed in this paper exhibits strong adaptability to target detection tasks in traffic scenes with complex backgrounds.

Key words: object detection, traffic scene, feature enhancement, attention mechanism, YOLOv5

中图分类号: