测绘通报 ›› 2022, Vol. 0 ›› Issue (5): 38-44.doi: 10.13474/j.cnki.11-2246.2022.0138

• 学术研究 • 上一篇    下一篇

YOLO v4框架下Multi-Patch多帧增量式交通视频目标检测

文奴1,2,3, 郭仁忠1,2,3, 贺彪1,2,3,4, 万远5   

  1. 1. 深圳大学建筑与城市规划学院, 广东 深圳 518061;
    2. 深圳大学智慧城市研究院, 广东 深圳 518061;
    3. 粤港澳智慧城市联合实验室, 广东 深圳 518061;
    4. 城市国土资源监测与仿真重点实验室, 广东 深圳 518034;
    5. 湖北师范大学城市与环境学院, 湖北 黄石 435002
  • 收稿日期:2021-06-07 修回日期:2022-02-25 发布日期:2022-06-08
  • 通讯作者: 贺彪。E-mail:whu_hebiao@hotmail.com
  • 作者简介:文奴(1989-),男,博士,从事交通视频目标检测、图像处理及计算机视觉相关研究。E-mail:wennu1989@126.com
  • 基金资助:
    广东省科技创新战略专项(2020B1212030009);自然资源部城市国土资源监测与仿真重点实验室开放基金(KF-2018-03-031)

Multi-Patch multi-frame incremental traffic video object detection method based on YOLO v4

WEN Nu1,2,3, GUO Renzhong1,2,3, HE Biao1,2,3,4, WAN Yuan5   

  1. 1. School of Architecture & Urban Planning, Shenzhen University, Shenzhen 518061, China;
    2. Research Institute for Smart Cities, Shengzhen University, Shenzhen 518061, China;
    3. Guangdong-Hong Kong-Macau Joint Laboratory for Smart Cities, Shenzhen 518061, China;
    4. Key Laboratory of Urban Land Resources Monitoring and Simulation, Ministry of Natural Resources, Shenzhen 518034, China;
    5. College of Urban and Environmental Sciences, Hubei Normal University, Huangshi 435002, China
  • Received:2021-06-07 Revised:2022-02-25 Published:2022-06-08

摘要: 提升目标检测模型的泛化能力是计算机视觉领域的研究热点和关键难点。本文提出了一种Multi-Patch方法和多帧增量式预测策略,提升了不同场景下交通视频目标检测的稳健性,有效解决了目标尺度多变导致的视频中目标召回率低的问题。根据视频分辨率和目标尺寸,基于Multi-Patch方法自动将视频帧分割成最佳输入尺寸,使用YOLO v4神经网络并关联连续帧的上下文信息,采用增量式预测策略降低视频目标检测的漏检率,提升不同场景下视频目标的检测置信度得分和召回率。采集不同拍摄条件下的交通视频,验证该方法的有效性。试验结果表明,本文提出的目标检测方法召回率在80%以上,置信度平均得分在0.84以上。

关键词: 视频目标检测, 多帧融合, YOLO v4, 卷积神经网络

Abstract: Improving the generalization ability of object detection model is a research focus and key issue in the field of computer vision. This paper proposes a Multi-Patch method and a multi-frame incremental prediction strategy to improve the robustness of traffic video object detection in different scenarios, and effectively solve the problem of low object recall ratio in videos caused by variable object scales. According to the video resolution and object size, the video frame is automatically divided into the best input size based on the Multi-Patch method, the YOLO v4 neural network is used to correlate the context information of the continuous frame, and the incremental prediction strategy is used to reduce the missed detection rate of the video object detection, and to improve the detection confidence score and recall rate of video object in different scenarios. Collect traffic videos under different shooting conditions to verify the effectiveness of the algorithm. Experimental results show that the object detection method proposed in this paper has a recall rate of more than 80% and an average confidence score of more than 0.84.

Key words: video object detection, multi-frame fusion, YOLO v4, convolutional neural networks

中图分类号: