测绘通报 ›› 2021, Vol. 0 ›› Issue (10): 108-113.doi: 10.13474/j.cnki.11-2246.2021.315

• 学术研究 • 上一篇    下一篇

多源地址要素可信度评估——以道路要素为例

孙立财1,2,3,4, 陈以松5, 熊杰5, 罗安2, 王勇1,2   

  1. 1. 兰州交通大学测绘与地理信息学院, 甘肃 兰州 730070;
    2. 中国测绘科学研究院, 北京 100036;
    3. 地理国情监测技术应用国家地方联合工程研究中心, 甘肃 兰州 730070;
    4. 甘肃省地理国情监测工程实验室, 甘肃 兰州 730070;
    5. 中国电信股份有限公司四川分公司, 四川 成都 610015
  • 收稿日期:2021-01-18 出版日期:2021-10-25 发布日期:2021-11-13
  • 通讯作者: 陈以松。E-mail:chenyisong@sctel.com.cn
  • 作者简介:孙立财(1994-),男,硕士生,主要研究方向为网络地理信息获取与挖掘。E-mail:0618642@stu.lzjtu.edu.cn
  • 基金资助:
    兰州交通大学优秀平台(201806);国家重点研发计划(2017YFB0503502;2017YBF0503601);中国测绘科学研究院基本科研业务费项目(AR2011)

Evaluation of the credibility of multi-source address elements: a case study of road feature

SUN Licai1,2,3,4, CHEN Yisong5, XIONG Jie5, LUO An2, WANG Yong1,2   

  1. 1. Faculty of Geomatics, Lanzhou Jiaotong University, Lanzhou 730070, China;
    2. Chinese Academy of Surveying & Mapping, Beijing 100036, China;
    3. National-Local Joint Engineering Research Center of Technologies and Applications for National Geographic State Monitoring, Lanzhou 730070, China;
    4. Gansu Provincial Engineering Laboratory for National Geographic State Monitoring, Lanzhou 730070, China;
    5. China Telecom Corporation Limited Sichuan Branch, Chengdu 610015, China
  • Received:2021-01-18 Online:2021-10-25 Published:2021-11-13

摘要: 随着自发地理信息和中文地址要素切分技术的发展,地址要素的质量有待评价。本文针对中文地址文本切分产生的地址要素质量难以有效评价的问题,提出了一种多源数据和网络检索支持下的地址要素可信度评估方法。首先利用中文分词工具对地址要素进行分词与词性标注,通过分析词频和词性组合模式,对地址要素的命名结构进行可信度计算。其次基于大规模的地址样本、道路数据及POI数据,挖掘多源数据对地址要素的数据支撑,计算数据支持度。然后利用搜索引擎对地址要素进行快速检索,分析搜索结果与数量,对地址要素的网络可信度进行计算。最后提出一种地址要素综合可信度计算模型,实现地址要素的综合可信度计算。试验结果表明,该模型与方法不仅能够高效快速地计算中文地址文本中地址要素的可信度,还能够有效发现地址要素中存在的偏僻、虚假等相关问题,为地址要素的自动化检测与标准化处理提供参考。

关键词: 多源数据, 地址要素, 可信度评估, 中文分词, 归一化

Abstract: With the development of spontaneous geographic information and Chinese address element segmentation technology, the quality of address elements needs to be evaluated. Aiming at the problem that the quality of address elements produced by Chinese address text segmentation is difficult to effectively evaluate, this paper proposes a method for evaluating the credibility of address elements supported by multi-source data and network retrieval. Firstly, the Chinese word segmentation tool is used to segment the address elements and part-of-speech tagging. By analyzing the word frequency and part-of-speech combination mode, the credibility of the naming structure of the address elements is calculated. Then, based on large-scale address samples, road data, and POI data, excavate the data support of multi-source data to address elements, and calculate the data support. Then use the search engine to retrieve the address elements quickly, analyze the search results and quantity, and calculate the network credibility of the address elements. Finally, a comprehensive credibility calculation model for address elements is proposed to realize the comprehensive credibility calculation of address elements. Experimental results show that the model and method can not only efficiently and quickly calculate the credibility of address elements in Chinese address texts, but also effectively discover the remoteness and falsehood of address elements, which provides a reference for the automatic detection and standardization of address elements.

Key words: multi-source data, credibility evaluation, Chinese word segmentation, address element, information normalize

中图分类号: