A dynamic commodity visual recognition method based on domain adaptation

doi:10.13474/j.cnki.11-2246.2023.0064

Abstract

Abstract: Due to large deformation, occlusion, motion blur, similarity in appearance between items, and unknown distribution deviation in real scenes, item dynamic visual recognition still has huge challenges in practical applications. To this end, this paper proposes a dynamic commodity visual recognition method for smart retail. First, the bounding rectangle of the commodity is detected in real time through the target detection network, and then the category of the commodity is identified on this basis and recommendations are given to assist in the completion of consumer settlement. At the same time, in view of the cross-domain difference between the product picking video, the product library image, and the training image, this paper introduces a neighborhood style adaptive model (IBN) and a convolutional attention module (CBAM) to improve the domain adaptability of the model. In order to verify the effectiveness of this method, this paper constructs a real scene dataset Commodity247. The data is collected by the top-view camera of the smart container, including 247 common retail commodities and 37 050 pictures with annotated boxes and commodity categories. The experimental results show that on the Commodity247 dataset, the accuracy rate of product recognition(mAP) can reach 96.84%, the accuracy rate of the first recommendation(Rank1) can reach 98.41%, and the accuracy rate of the most difficult sample retrieval(mINP) can reach 85.24%, which is better than the one based on ResNet. For the basic model, mAP increases by 2.91%, Rank1 increases by 0.60%, and mINP increases by 10.86%, effectively reducing the influence of multi-angle, multi-light, and multi-background.

Key words: dynamic commodity recognition, instance batch normalization, attention, commodity recognition dataset

CLC Number:

P237

LEI Yangyang, LI Li, SUN Fei, YAO Jian. A dynamic commodity visual recognition method based on domain adaptation[J]. Bulletin of Surveying and Mapping, 2023, 0(3): 10-15.

References

[1] 梁英男. 智能货柜场景下的商品识别算法的研究与系统设计[D]. 广州:华南理工大学, 2020.
[2] HE Kaiming, ZHANG Xiangyu, REN Shaoqing, et al. Deep residual learning for image recognition[C]//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas:IEEE, 2016:770-778.
[3] HU Jie, SHEN Li, SUN Gang. Squeeze-and-excitation networks[C]//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City:IEEE, 2018:7132-7141.
[4] HUANG Gao, LIU Zhuang, VAN DER MAATEN L, et al. Densely connected convolutional networks[C]//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Honolulu:IEEE, 2017:2261-2269.
[5] SANDLER M, HOWARD A, ZHU Menglong, et al. MobileNetV2:inverted residuals and linear bottlenecks[C]//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City:IEEE, 2018:4510-4520.
[6] ZHANG Xiangyu, ZHOU Xinyu, LIN Mengxiao, et al. ShuffleNet:an extremely efficient convolutional neural network for mobile devices[C]//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City:IEEE, 2018:6848-6856.
[7] XIE Saining, GIRSHICK R, DOLLÁR P, et al. Aggregated residual transformations for deep neural networks[C]//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Honolulu:IEEE, 2017:5987-5995.
[8] ZHANG Hang, WU Chongruo, ZHANG Zhongyue, et al. ResNeSt:split-attention networks[C]//Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).New Orleans:IEEE, 2022:2735-2745.
[9] 张文强. 基于深度学习的商品检测和识别研究[D]. 成都:电子科技大学, 2019.
[10] ZHAN Xunlin, WU Yangxin, DONG Xiao, et al. Product1M:towards weakly supervised instance-level product retrieval via cross-modal pretraining[C]//Proceedings of 2021 IEEE/CVF International Conference on Computer Vision (ICCV). Montreal:IEEE, 2021:11762-11771.
[11] 李昊璇, 闫新艳. 基于深度残差收缩网络的商品图像识别[J]. 测试技术学报, 2021, 35(4):294-299.
[12] ZENG Weiyu, WANG Tianlei, CAO Jiuwen, et al. Clustering-guided pairwise metric triplet loss for person reidentification[J]. IEEE Internet of Things Journal, 2022, 9(16):15150-15160.
[13] KOMATSU R, GONSALVES T. Multi-CartoonGAN with conditional adaptive instance-layer normalization for conditional artistic face translation[J]. AI, 2022, 3(1):37-52.
[14] ZHONG Zhun, ZHENG Liang, CAO Donglin, et al. Re-ranking person re-identification with k-reciprocal encoding[C]//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Honolulu:IEEE, 2017:3652-3661.