Multi-Patch multi-frame incremental traffic video object detection method based on YOLO v4

doi:10.13474/j.cnki.11-2246.2022.0138

Abstract

Abstract: Improving the generalization ability of object detection model is a research focus and key issue in the field of computer vision. This paper proposes a Multi-Patch method and a multi-frame incremental prediction strategy to improve the robustness of traffic video object detection in different scenarios, and effectively solve the problem of low object recall ratio in videos caused by variable object scales. According to the video resolution and object size, the video frame is automatically divided into the best input size based on the Multi-Patch method, the YOLO v4 neural network is used to correlate the context information of the continuous frame, and the incremental prediction strategy is used to reduce the missed detection rate of the video object detection, and to improve the detection confidence score and recall rate of video object in different scenarios. Collect traffic videos under different shooting conditions to verify the effectiveness of the algorithm. Experimental results show that the object detection method proposed in this paper has a recall rate of more than 80% and an average confidence score of more than 0.84.

Key words: video object detection, multi-frame fusion, YOLO v4, convolutional neural networks

CLC Number:

P237

WEN Nu, GUO Renzhong, HE Biao, WAN Yuan. Multi-Patch multi-frame incremental traffic video object detection method based on YOLO v4[J]. Bulletin of Surveying and Mapping, 2022, 0(5): 38-44.

References

[1] REN S Q, HE K M, GIRSHICK R, et al. Faster R-CNN:towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(6):1137-1149.
[2] HE K M, GKIOXARI G, DOLLÁR P, et al. Mask R-CNN[C]//Proceedings of 2017 IEEE International Conference on Computer Vision. Venice:IEEE, 2017:2980-2988.
[3] LIU W, ANGUELOV D, ERHAN D, et al. SSD:single shot multibox detector[C]//Proceedings of 2016 European Conference on Computer Vision. Cham:Springer, 2016:21-37.
[4] REDMON J, FARHADI A. YOLO v3:an incremental improvement[EB/OL].[2021-06-07]. http://alumni.soe.ucsc.edu/~czczycz/src/YOLO v3.pdf.
[5] YOO D, PARK S, LEE J Y, et al. Attentionnet:aggregating weak directions for accurate object detection[C]//Proceedings of 2015 IEEE International Conference on Computer Vision. Santiago:IEEE, 2015:2659-2667.
[6] CAICEDO J C, LAZEBNIK S. Active object localization with deep reinforcement learning[C]//Proceedings of 2015 IEEE International Conference on Computer Vision. Santiago:IEEE, 2015:2488-2496.
[7] BOCHKOVSKIY A,WANG C Y,LIAO H Y M.YOLO v4:optimal speed and accuracy of object detection[EB/OL].[2021-06-07]. https://arxiv.org/pdf/2004.10934.pdf.
[8] HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition[C]//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas:2016:770-778.
[9] HE K M, ZHANG X Y, REN S Q, et al. Spatial pyramid pooling in deep convolutional networks for visual recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(9):1904-1916.
[10] LIU S T, HUANG D, WANG Y H. Learning spatial fusion for single-shot object detection[EB/OL].[2021-06-07]. https://arxiv.org/pdf/1911.09516.pdf.
[11] ZHOU Y, JI J, SONG K. A moving target detection method based on improved frame difference background modeling[J]. Open Cybernetics&Systemics Journal, 2014, 8(1):970-975.
[12] 乐英,赵志成.基于背景差分法的多运动目标检测与分割[J].中国工程机械学报, 2020, 18(4):305-309.
[13] 欧阳玉梅.基于稠密光流算法的运动目标检测的Python实现[J].现代电子技术, 2021, 44(1):78-82.
[14] FU C Y,LIU W,RANGA A,et al.DSSD:deconvolutional single shot detector[EB/OL].[2021-06-07]. https://arxiv.org/pdf/1701.06659.pdf.
[15] CHEN Y, ZHANG P, LI Z, et al. Stitcher:feedback-driven data provider for object detectio[EB/OL].[2021-06-07]. https://arxiv.org/pdf/2004.12432v1.pdf.
[16] JIANG J, BAO S, SHI W, et al. Improved traffic sign recognition algorithm based on YOLO v3 algorithm[J]. Journal of Computer Applications, 2020, 40(8):2472-2478.
[17] 金立生,郭柏苍,王芳荣,等.基于改进YOLO v3的车辆前方动态多目标检测算法[J].吉林大学学报(工学版), 2021, 51(4):1427-1436.
[18] 赵媛媛,朱军,谢亚坤,等.改进YOLO v3的视频图像火焰实时检测算法[J].武汉大学学报(信息科学版), 2021, 46(3):326-334.
[19] HATAB M,MALEKMOHAMADI H,AMIRA A.Surface defect detection using YOLO network[C]//Proceedings of 2020 SAI Intelligent Systems Conference. Cham:Springer, 2020:505-515.
[20] LI H C, XIONG P F, AN J, et al. Pyramid attention network for semantic segmentation[EB/OL].[2021-06-07]. https://arxiv.org/pdf/1805.10180.pdf.
[21] WANG C Y, LIAO H Y M, WU Y H, et al. CSPNet:a new backbone that can enhance learning capability of CNN[C]//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. Seattle:IEEE, 2020:390-391.
[22] NEUBECK A, VAN GOOL L. Efficient non-maximum suppression[C]//Proceedings of the 18th International Conference on Pattern Recognition. Hong Kong:IEEE, 2006:850-855.