改进的多任务道路特征提取网络及权重优化

doi:10.13474/j.cnki.11-2246.2023.0350

摘要/Abstract

摘要： 为应对自动驾驶在复杂道路环境下的挑战,提出了多项任务合作的需求。在自然语言处理及推荐算法领域,利用多任务学习网络已被证明可以减少多种任务耦合情况下的时间、算力及存储使用。由于多任务学习网络的这种特点,近年来也开始应用于基于视觉的道路特征提取方面。本文提出了一种结合FPN网络的解码器头结构,并将其应用于基于YOLOv4网络的多任务学习道路特征提取网络;通过研究多任务权重设置的影响对多任务网络算法进行优化,并在同类算法中验证了权重设置的有效性。在BDD-100K数据集上进行的试验结果表明,本文结构在保证实时性的同时精度也优于同类方法,本文方法为基于视觉的自动驾驶过程中车辆的自主道路感知及高精地图的生成提供了新思路与新方法。

关键词: 道路特征提取, 多任务学习网络, 权重优化, 交通目标检测, 车道线分割, 可驾驶区域分割

Abstract: In order to address the challenges of autonomous driving in complex road environments, the need for collaborative multi-tasking has been proposed. In the fields of natural language processing and recommendation algorithms, the use of multi-task learning networks has been proven to reduce time, computing power, and storage usage in multiple task coupling scenarios. Due to this characteristic of multi-task learning networks, in recent years, it has also been applied to visual-based road feature extraction. This paper proposes a decoder head structure combined with the FPN network and applies it to a YOLOv4-based multi-task learning road feature extraction network. Additionally, the paper optimizes the multi-task network algorithm through investigating the impact of multi-task weight settings. The effectiveness of the weight settings was also verified among similar algorithms. The experimental results obtained on the BDD-100K dataset show that the proposed structure has better accuracy while still ensuring real-time performance compared to similar methods. This paper's method provides new ideas and methodologies for vehicle autonomous road perception and high-precision map generation in visual-based autonomous driving processes.

Key words: road feature extraction, multi-task learning network, weight optimization, traffic object detection, lane line segmentation, drivable area segmentation

中图分类号:

P237

朱文杰, 李宏伟, 姜懿芮, 程相龙, 赵珊. 改进的多任务道路特征提取网络及权重优化[J]. 测绘通报, 2023, 0(12): 1-7.

ZHU Wenjie, LI Hongwei, JIANG Yirui, CHENG Xianglong, ZHAO Shan. Improved multi-task road feature extraction network and weight optimization[J]. Bulletin of Surveying and Mapping, 2023, 0(12): 1-7.

参考文献

[1] GARG S,SÜNDERHAUF N,DAYOUB F,et al. Semantics for robotic mapping,perception and interaction: a survey[EB/OL]. 2021-03-05[2023-01-11]. https://arxiv.org/abs/2101.00443.pdf.
[2] GIRSHICK R,DONAHUE J,DARRELL T,et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]//Proceedings of 2014 IEEE Conference on Computer Vision and Pattern Recognition. Columbus:IEEE 2014: 580-587.
[3] GIRSHICK R. Fast R-CNN[C]//Proceedings of 2015 IEEE International Conference on Computer Vision(ICCV). Santiago,Chile:IEEE,2015: 1440-1448.
[4] REDMON J,DIVVALA S,GIRSHICK R,et al. You only look once: unified,real-time object detection[C]//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Vegas:IEEE,2016: 779-788.
[5] REDMON J,FARHADI A. YOLO9000: better,faster,stronger[C]//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition(CVPR). Honolulu:IEEE,2017: 7263-7271.
[6] FARHADI A,REDMON J. Yolov3: an incremental improvement[C]//Proceedings of 2018 Computer Vision and Pattern Recognition. Berlin/Heidelberg: Springer,2018.
[7] BOCHKOVSKIY A,WANG C Y,LIAO H Y M. Yolov4: optimal speed and accuracy of object detection[EB/OL]. 2020-08-23[2023-01-12]. https://arxiv.org/abs/2004.10934.
[8] RONNEBERGER O,FISCHER P,BROX T. U-net: Convolutional networks for biomedical image segmentation[C]// Proceedings of the 18th International Conference on Medical Image Computing and Computer-Assisted Intervention-MICCAI. Munich: Springer International Publishing,2015: 234-241.
[9] LIN T Y,DOLLÁR P,GIRSHICK R,et al. Feature pyramid networks for object detection[EB/OL]. 2016-12-05[2023-01-11]. https://ui.adsabs.harvard.edu/abs/2016arXiv161203144L/abstract.
[10] ZHENG Tu,FANG Hao,ZHANG Yi,et al. RESA: recurrent feature-shift aggregator for lane detection[EB/OL]. 2016-12-05[2023-01-11]. https://arxiv.org/abs/2101.00443.pdf.
[11] PAN X,SHI J,LUO P,et al. Spatial as deep: Spatial CNN for traffic scene understanding[C]//Proceedings of 2018 AAAI Conference on Artificial Intelligence.[S.l.]:AAAI,2018.
[12] NEVEN D,DE BRABANDERE B,GEORGOULIS S,et al. Towards end-to-end lane detection: an instance segmentation approach [C]// Proceedings of 2018 IEEE Intelligent Vehicles Symposium (IV). Changshu: IEEE,2018.
[13] CARUANA R. Multitask learning: a knowledge-based source of inductive bias1[C]//Proceedings of the 10th International Conference on Machine Learning. Amherst:[s.n.],1993: 41-48.
[14] MA J,ZHAO Z,YI X,et al. Modeling task relationships in multi-task learning with multi-gate mixture-of-experts[C]//Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery ＆ data mining. [S.l.]:ACM Press,2018: 1930-1939.
[15] QIN Z,CHENG Y,ZHAO Z,et al. Multitask mixture of sequential experts for user activity streams[C]//Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery ＆ Data Mining. [S.l.]:ACM Press,2020: 3083-3091.
[16] ZHAO Z,HONG L,WEI L,et al. Recommending what video to watch next: a multitask ranking system[C]//Proceedings of the 13th ACM Conference on Recommender Systems[S.l.]:ACM Press,2019: 43-51.
[17] REN Shaoqing,HE Kaiming,GIRSHICK R,et al. Faster R-CNN: towards real-time object detection with region proposal networks[J].IEEE Transactions Pattern Analysis and Machine Intelligence,2017,39(6):1137-1149.
[18] HE K,ZHANG X,REN S,et al. Deep residual learning for image recognition[C]//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Vegas:IEEE,2016: 770-778.
[19] DUAN K,XIE L,QI H,et al. Location-sensitive visual recognition with cross-iou loss[EB/OL]. 2021-04-05[2023-01-11]. https://arxiv.org/abs/2101.00443.pdf.
[20] TEICHMANN M,WEBER M,ZOELLNER M,et al. MultiNet: real-time joint semantic reasoning for autonomous driving[C]// Proceedings of 2018 IEEE Intelligent Vehicles Symposium (IV). Changshu: IEEE,2018: 1013-1020.
[21] WUD,LIAO M W,ZHANG W T,et al. Yolop: You only look once for panoptic driving perception[J]. Machine Intelligence Research,2022(6): 550-562.
[22] VU D,NGO B,PHAN H. Hybridnets: end-to-end perception network[EB/OL]. 2016-12-05[2023-01-11]. https://doi.org/10.48550/arXiv.2203.09035.
[23] YU F,CHEN H,WANG X,et al. Bdd100k: a diverse driving dataset for heterogeneous multitask learning[C]//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle:IEEE,2020: 2636-2645.
[24] HE K,ZHANG X,REN S,et al. Spatial pyramid pooling in deep convolutional networks for visual recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence,2015,37(9): 1904-1916.
[25] ZHAO H,SHI J,QI X,et al. Pyramid scene parsing network[C]//Proceedings of 2017 IEEE conference on computer vision and pattern recognition. Honolulu:IEEE,2017: 2881-2890.