测绘通报 ›› 2023, Vol. 0 ›› Issue (3): 10-15.doi: 10.13474/j.cnki.11-2246.2023.0064

• 学术研究 • 上一篇    下一篇

一种基于域适应的动态商品视觉识别方法

雷洋洋1, 李礼1, 孙飞1, 姚剑1,2   

  1. 1. 武汉大学遥感信息工程学院, 湖北 武汉 430079;
    2. 广东开放大学人工智能应用创新研究中心, 广东 广州 510091
  • 收稿日期:2022-03-31 发布日期:2023-04-04
  • 通讯作者: 姚剑。E-mail:jian.yao@whu.edu.cn
  • 作者简介:雷洋洋(1997-),女,硕士生,研究方向为图像检测与识别。E-mail:leiyangyang@whu.edu.cn
  • 基金资助:
    CCF-百度松果基金(OF2021023);深圳市中央引导地方科技发展专项资金(2021Szvup100)

A dynamic commodity visual recognition method based on domain adaptation

LEI Yangyang1, LI Li1, SUN Fei1, YAO Jian1,2   

  1. 1. School of Remote Sensing and Information Engineering, Wuhan University, Wuhan 430079, China;
    2. AI Application and Innovation Research Center, The Open University of Guangdong, Guangzhou 510091, China
  • Received:2022-03-31 Published:2023-04-04

摘要: 由于变形、遮挡、运动模糊、商品之间外观的相似性及真实场景中未知的分布偏差,商品动态视觉识别在实际应用中仍存在巨大的挑战。本文提出了一种面向智能零售的动态商品视觉识别方法,即首先通过目标检测网络实时检测商品的外接矩形框,然后在此基础上识别商品的类别并给予推荐,辅助完成消费结算。针对商品拿取视频与商品库图像、训练图像之间的跨域差异,引入邻域风格自适应模型(IBN)和卷积注意力模块(CBAM),提升模型的域适应能力。为了验证该方法的有效性,构建了一个真实的场景数据集Commodity247,数据由智能货柜的顶视摄像头采集,包含247类常见的零售商品,以及37 050张带标注框和商品类别的图片。试验结果表明,在Commodity247数据集上,商品识别的准确率(mAP)可达96.84%,第一推荐正确率(Rank1)可达98.41%,最难样本检索准确率(mINP)可达85.24%;与基于ResNet搭建的基础模型相比,mAP提升了2.91%,Rank1提升了0.60%,mINP提升了10.86%,有效降低了多角度、多光线、多背景的影响。

关键词: 动态商品视觉识别, 邻域风格自适应, 注意力机制, 商品识别数据集

Abstract: Due to large deformation, occlusion, motion blur, similarity in appearance between items, and unknown distribution deviation in real scenes, item dynamic visual recognition still has huge challenges in practical applications. To this end, this paper proposes a dynamic commodity visual recognition method for smart retail. First, the bounding rectangle of the commodity is detected in real time through the target detection network, and then the category of the commodity is identified on this basis and recommendations are given to assist in the completion of consumer settlement. At the same time, in view of the cross-domain difference between the product picking video, the product library image, and the training image, this paper introduces a neighborhood style adaptive model (IBN) and a convolutional attention module (CBAM) to improve the domain adaptability of the model. In order to verify the effectiveness of this method, this paper constructs a real scene dataset Commodity247. The data is collected by the top-view camera of the smart container, including 247 common retail commodities and 37 050 pictures with annotated boxes and commodity categories. The experimental results show that on the Commodity247 dataset, the accuracy rate of product recognition(mAP) can reach 96.84%, the accuracy rate of the first recommendation(Rank1) can reach 98.41%, and the accuracy rate of the most difficult sample retrieval(mINP) can reach 85.24%, which is better than the one based on ResNet. For the basic model, mAP increases by 2.91%, Rank1 increases by 0.60%, and mINP increases by 10.86%, effectively reducing the influence of multi-angle, multi-light, and multi-background.

Key words: dynamic commodity recognition, instance batch normalization, attention, commodity recognition dataset

中图分类号: