测绘通报 ›› 2022, Vol. 0 ›› Issue (2): 145-148.doi: 10.13474/j.cnki.11-2246.2022.0060

• 技术交流 • 上一篇    下一篇

多策略中文地址匹配方法

彭雨龙1,2, 胡顺石1,2, 吴涛1,2   

  1. 1. 湖南师范大学地理科学学院, 湖南 长沙 410081;
    2. 湖南师范大学地理空间大数据挖掘与应用湖南省重点试验室, 湖南 长沙 410081
  • 收稿日期:2021-03-03 修回日期:2021-06-02 发布日期:2022-03-11
  • 通讯作者: 吴涛。E-mail:blackender@163.com
  • 作者简介:彭雨龙(1996-),男,硕士生,主要方向为空间数据挖掘。E-mail:1107658685@qq.com
  • 基金资助:
    湖南省自然科学基金项目(2018JJ3348);湖南省教育厅科学研究项目(17C0952)

Multi-strategy chinese address matching method

PENG Yulong1,2, HU Shunshi1,2, WU Tao1,2   

  1. 1. College of Geographic Sciences, Hunan Normal University, Changsha 410081, China;
    2. Key Laboratory of Geospatial Big Data Mining and Application, Hunan Province, Hunan Normal University, Changsha 410081, China
  • Received:2021-03-03 Revised:2021-06-02 Published:2022-03-11

摘要: 地址匹配是地理编码过程中一个关键环节,是实现数据空间化的关键技术之一。针对当前中文地址匹配方法的精确率、匹配率和时间开销不能兼顾的问题,本文提出了一种多策略中文地址匹配方法。通过建立轻量级的词典进行中文地址分词,同时构建多叉树存储分词后的地址数据,匹配过程中结合模糊匹配和层级回溯匹配共同完成地址匹配工作,最终基于真实数据进行了试验。试验结果表明,该方法在匹配率、精确率和时间开销3个指标上较当前其他匹配方法表现得更加均衡。

关键词: 地址匹配, 中文地址分词, 多叉树, 层级回溯, 余弦相似度

Abstract: Address matching is a crucial link in geocoding and is one of the key technologies to realize data spatialization. Aiming at the problem that the matching rate,accuracy and time cost of the current Chinese address matching method cannot be taken into account, this study proposes a multi-strategy Chinese address matching method. The main idea is to build a lightweight dictionary for Chinese address segmentation and a multi-tree to store the address data after creating words participle. In the matching process, the fuzzy matching and hierarchical backtracking matching are combined to complete the address matching. Based on real data, this paper conducts experiments, and the results show that this method is more balanced than other matching methods in matching rate, accuracy rate and time cost.

Key words: address matching, Chinese address segmentation, multi-tree, hierarchical backtracking, cosine similarity

中图分类号: