YOLO v4框架下Multi-Patch多帧增量式交通视频目标检测

doi:10.13474/j.cnki.11-2246.2022.0138

测绘通报 ›› 2022, Vol. 0 ›› Issue (5): 38-44.doi: 10.13474/j.cnki.11-2246.2022.0138

YOLO v4框架下Multi-Patch多帧增量式交通视频目标检测

文奴^1,2,3, 郭仁忠^1,2,3, 贺彪^1,2,3,4, 万远⁵

1. 深圳大学建筑与城市规划学院, 广东深圳 518061;
2. 深圳大学智慧城市研究院, 广东深圳 518061;
3. 粤港澳智慧城市联合实验室, 广东深圳 518061;
4. 城市国土资源监测与仿真重点实验室, 广东深圳 518034;
5. 湖北师范大学城市与环境学院, 湖北黄石 435002

收稿日期:2021-06-07 修回日期:2022-02-25 发布日期:2022-06-08
通讯作者: 贺彪。E-mail:whu_hebiao@hotmail.com
作者简介:文奴(1989-),男,博士,从事交通视频目标检测、图像处理及计算机视觉相关研究。E-mail:wennu1989@126.com
基金资助:
广东省科技创新战略专项(2020B1212030009);自然资源部城市国土资源监测与仿真重点实验室开放基金(KF-2018-03-031)

Multi-Patch multi-frame incremental traffic video object detection method based on YOLO v4

WEN Nu^1,2,3, GUO Renzhong^1,2,3, HE Biao^1,2,3,4, WAN Yuan⁵

1. School of Architecture & Urban Planning, Shenzhen University, Shenzhen 518061, China;
2. Research Institute for Smart Cities, Shengzhen University, Shenzhen 518061, China;
3. Guangdong-Hong Kong-Macau Joint Laboratory for Smart Cities, Shenzhen 518061, China;
4. Key Laboratory of Urban Land Resources Monitoring and Simulation, Ministry of Natural Resources, Shenzhen 518034, China;
5. College of Urban and Environmental Sciences, Hubei Normal University, Huangshi 435002, China

Received:2021-06-07 Revised:2022-02-25 Published:2022-06-08

摘要/Abstract

摘要： 提升目标检测模型的泛化能力是计算机视觉领域的研究热点和关键难点。本文提出了一种Multi-Patch方法和多帧增量式预测策略,提升了不同场景下交通视频目标检测的稳健性,有效解决了目标尺度多变导致的视频中目标召回率低的问题。根据视频分辨率和目标尺寸,基于Multi-Patch方法自动将视频帧分割成最佳输入尺寸,使用YOLO v4神经网络并关联连续帧的上下文信息,采用增量式预测策略降低视频目标检测的漏检率,提升不同场景下视频目标的检测置信度得分和召回率。采集不同拍摄条件下的交通视频,验证该方法的有效性。试验结果表明,本文提出的目标检测方法召回率在80%以上,置信度平均得分在0.84以上。

关键词: 视频目标检测, 多帧融合, YOLO v4, 卷积神经网络

Abstract: Improving the generalization ability of object detection model is a research focus and key issue in the field of computer vision. This paper proposes a Multi-Patch method and a multi-frame incremental prediction strategy to improve the robustness of traffic video object detection in different scenarios, and effectively solve the problem of low object recall ratio in videos caused by variable object scales. According to the video resolution and object size, the video frame is automatically divided into the best input size based on the Multi-Patch method, the YOLO v4 neural network is used to correlate the context information of the continuous frame, and the incremental prediction strategy is used to reduce the missed detection rate of the video object detection, and to improve the detection confidence score and recall rate of video object in different scenarios. Collect traffic videos under different shooting conditions to verify the effectiveness of the algorithm. Experimental results show that the object detection method proposed in this paper has a recall rate of more than 80% and an average confidence score of more than 0.84.

Key words: video object detection, multi-frame fusion, YOLO v4, convolutional neural networks

中图分类号:

P237

文奴, 郭仁忠, 贺彪, 万远. YOLO v4框架下Multi-Patch多帧增量式交通视频目标检测[J]. 测绘通报, 2022, 0(5): 38-44.

WEN Nu, GUO Renzhong, HE Biao, WAN Yuan. Multi-Patch multi-frame incremental traffic video object detection method based on YOLO v4[J]. Bulletin of Surveying and Mapping, 2022, 0(5): 38-44.

导出引用管理器 EndNote|Reference Manager|ProCite|BibTeX|RefWorks

链接本文: http://tb.chinasmp.com/CN/10.13474/j.cnki.11-2246.2022.0138

http://tb.chinasmp.com/CN/Y2022/V0/I5/38

参考文献

[1] REN S Q, HE K M, GIRSHICK R, et al. Faster R-CNN:towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(6):1137-1149.
[2] HE K M, GKIOXARI G, DOLLÁR P, et al. Mask R-CNN[C]//Proceedings of 2017 IEEE International Conference on Computer Vision. Venice:IEEE, 2017:2980-2988.
[3] LIU W, ANGUELOV D, ERHAN D, et al. SSD:single shot multibox detector[C]//Proceedings of 2016 European Conference on Computer Vision. Cham:Springer, 2016:21-37.
[4] REDMON J, FARHADI A. YOLO v3:an incremental improvement[EB/OL].[2021-06-07]. http://alumni.soe.ucsc.edu/~czczycz/src/YOLO v3.pdf.
[5] YOO D, PARK S, LEE J Y, et al. Attentionnet:aggregating weak directions for accurate object detection[C]//Proceedings of 2015 IEEE International Conference on Computer Vision. Santiago:IEEE, 2015:2659-2667.
[6] CAICEDO J C, LAZEBNIK S. Active object localization with deep reinforcement learning[C]//Proceedings of 2015 IEEE International Conference on Computer Vision. Santiago:IEEE, 2015:2488-2496.
[7] BOCHKOVSKIY A,WANG C Y,LIAO H Y M.YOLO v4:optimal speed and accuracy of object detection[EB/OL].[2021-06-07]. https://arxiv.org/pdf/2004.10934.pdf.
[8] HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition[C]//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas:2016:770-778.
[9] HE K M, ZHANG X Y, REN S Q, et al. Spatial pyramid pooling in deep convolutional networks for visual recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(9):1904-1916.
[10] LIU S T, HUANG D, WANG Y H. Learning spatial fusion for single-shot object detection[EB/OL].[2021-06-07]. https://arxiv.org/pdf/1911.09516.pdf.
[11] ZHOU Y, JI J, SONG K. A moving target detection method based on improved frame difference background modeling[J]. Open Cybernetics&Systemics Journal, 2014, 8(1):970-975.
[12] 乐英,赵志成.基于背景差分法的多运动目标检测与分割[J].中国工程机械学报, 2020, 18(4):305-309.
[13] 欧阳玉梅.基于稠密光流算法的运动目标检测的Python实现[J].现代电子技术, 2021, 44(1):78-82.
[14] FU C Y,LIU W,RANGA A,et al.DSSD:deconvolutional single shot detector[EB/OL].[2021-06-07]. https://arxiv.org/pdf/1701.06659.pdf.
[15] CHEN Y, ZHANG P, LI Z, et al. Stitcher:feedback-driven data provider for object detectio[EB/OL].[2021-06-07]. https://arxiv.org/pdf/2004.12432v1.pdf.
[16] JIANG J, BAO S, SHI W, et al. Improved traffic sign recognition algorithm based on YOLO v3 algorithm[J]. Journal of Computer Applications, 2020, 40(8):2472-2478.
[17] 金立生,郭柏苍,王芳荣,等.基于改进YOLO v3的车辆前方动态多目标检测算法[J].吉林大学学报(工学版), 2021, 51(4):1427-1436.
[18] 赵媛媛,朱军,谢亚坤,等.改进YOLO v3的视频图像火焰实时检测算法[J].武汉大学学报(信息科学版), 2021, 46(3):326-334.
[19] HATAB M,MALEKMOHAMADI H,AMIRA A.Surface defect detection using YOLO network[C]//Proceedings of 2020 SAI Intelligent Systems Conference. Cham:Springer, 2020:505-515.
[20] LI H C, XIONG P F, AN J, et al. Pyramid attention network for semantic segmentation[EB/OL].[2021-06-07]. https://arxiv.org/pdf/1805.10180.pdf.
[21] WANG C Y, LIAO H Y M, WU Y H, et al. CSPNet:a new backbone that can enhance learning capability of CNN[C]//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. Seattle:IEEE, 2020:390-391.
[22] NEUBECK A, VAN GOOL L. Efficient non-maximum suppression[C]//Proceedings of the 18th International Conference on Pattern Recognition. Hong Kong:IEEE, 2006:850-855.

YOLO v4框架下Multi-Patch多帧增量式交通视频目标检测

Multi-Patch multi-frame incremental traffic video object detection method based on YOLO v4

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

本文评价

[1]	陆增扬, 王式太, 殷敏, 张笑语, 许正阳, 黄君君, 于松超. 基于CNN-LSTM遮挡判断模型的室内定位算法[J]. 测绘通报, 2026, 0(6): 98-106.
[2]	黄飒, 赵东亮, 鲍燕辉, 武珂. 融合多尺度感知的轻量化道路病害检测方法[J]. 测绘通报, 2026, 0(6): 143-151.
[3]	张继超, 高子善, 张兵. 改进空洞卷积的极化合成孔径雷达数据分类方法[J]. 测绘通报, 2026, 0(5): 64-71.
[4]	刘松焕, 李朝奎, 田沁, 江岭. 基于投影和特征激活的实景三维玻璃幕墙分割[J]. 测绘通报, 2025, 0(5): 84-88.
[5]	孙凯, 徐青, 张瑞鑫, 苏友能. 改进特征金字塔池化的遥感影像障碍物提取[J]. 测绘通报, 2025, 0(4): 90-95.
[6]	谢巴图, 胡佳睿, 潘俊. 基于双路全局信息优化网络的遥感影像海陆分割算法[J]. 测绘通报, 2025, 0(3): 52-58,86.
[7]	苟长龙, 庞敏, 杨扬. 改进的U-Net卷积网络在遥感影像地物分类中的应用[J]. 测绘通报, 2025, 0(3): 150-155.
[8]	禹小伟, 郑亚东, 梁莉. 联合双注意力和多分支损失的多模态遥感影像分类方法[J]. 测绘通报, 2025, 0(11): 84-90,153.
[9]	范晶晶, 胡帆, 原辉, 张娜, 孟晓凯, 王帅. 架空输电线路覆冰厚度图卷积神经网络预测模型构建与应用[J]. 测绘通报, 2025, 0(1): 12-15.
[10]	程振豪, 赵冬青, 郭文卓, 赖路广, 李林阳. 一种融合5G CSI和地磁的集成学习定位方法[J]. 测绘通报, 2024, 0(7): 12-16.
[11]	罗卿莉, 陈志远, 刘宇婷, 张进, 李煜. 紧缩极化SAR卷积神经网络溢油检测方法[J]. 测绘通报, 2024, 0(6): 13-18.
[12]	王淑香, 林雨准, 金飞, 杨小兵, 黄子恒, 程传祥. 结合融合策略的光学影像道路提取技术[J]. 测绘通报, 2024, 0(4): 6-12.
[13]	王东妍, 于才, 沈鹍, 张振见, 李亚峰. 点云数据在道岔关键节点几何检测应用[J]. 测绘通报, 2024, 0(3): 107-112.
[14]	高桂棠, 郗连霞, 钟晓龙. 基于卷积神经网络和无人机倾斜摄影图像的单个高层建筑物高度测算方法[J]. 测绘通报, 2024, 0(3): 118-122.
[15]	马锦山, 贾国焕, 张赛, 张炯. 基于多源高分辨率遥感影像的典型自然资源要素提取[J]. 测绘通报, 2024, 0(3): 123-126,150.