Bulletin of Surveying and Mapping ›› 2022, Vol. 0 ›› Issue (3): 101-106.doi: 10.13474/j.cnki.11-2246.2022.0085

Previous Articles     Next Articles

A Chinese addresses matching method based on the pseudo-semantic model

YU Ting1,2, WANG Duo2, CHEN Qin1   

  1. 1. The Third Research Institute of Ministry of Public Security, Shanghai 200031, China;
    2. Fudan University, Shanghai 200433, China
  • Received:2021-03-16 Revised:2022-01-21 Online:2022-03-25 Published:2022-04-01

Abstract: Due to various ways to express the address element such as abbreviation and logogram,address matching is a difficult task specially in Chinese address matching.One important address matching method is relying on similarity.However,these traditional similarity methods focused on the overlap characters,and could not deal with the situation.The other crucial and useful method is based on deep learning technology,but it is difficult to generate a large amount of learning samples.In this paper,Bi-directional long short-term memory conditional random field is applied to achieve the goal of Chinese address segmentation.Then,a new similarity named pseudo-semantic is constructed to solve the problem of abbreviation and logogram.According to current results,the pseudosemantic similarity can provide better performance than other similarity models in the matching process and its recall and precision are both reaching 0.9 on the test set.The samples proved that the pseudo-semantic can recognize the abbreviation and logogram of address elements.

Key words: BiLSTM-CRF;resolution of address elements;pseudo-semantic model;addresses matching;address standardization

CLC Number: