Bulletin of Surveying and Mapping ›› 2023, Vol. 0 ›› Issue (8): 155-160,177.doi: 10.13474/j.cnki.11-2246.2023.0250

Previous Articles     Next Articles

Geographical named entity recognition based on human-in-the-loop learning enhancement

YANG Ying1, QIU Qinjun2,3,4, XIE Zhong3,4, TIAN Miao5, ZHENG Shiyu3,4, ZHENG Shuai5   

  1. 1. Shenzhen Data Management Center of Planning and Natural Resources, Shenzhen 518000, China;
    2. Key Laboratory of Urban Land Resources Monitoring and Simulation, Ministry of Natural Resources, Shenzhen 518034, China;
    3. School of Computer Science, China University of Geosciences, Wuhan 430074, China;
    4. National Local Joint Engineering Laboratory of Geographic Information System, Wuhan 430074, China;
    5. College of Computer and Information Technology, China Three Gorges University, Yichang 443002, China
  • Received:2023-05-13 Published:2023-09-01

Abstract: Geographical named entity recognition is an important part of high-quality geographic knowledge graph construction, which is widely used in geographic coding, semantic retrieval and geographic knowledge inference. The mainstream deep learning models suffer from the problems of time-consuming and laborious annotation corpus and poor model interpretability. In order to take advantage of the human-in-the-loop mechanism to promote learning models using a small number of samples, a geographical named entity recognition method based on human-in-the-loop learning enhancement is proposed: partially labeled and unlabeled geographic corpus is used as input, trained based on BERT-BiLSTM-CRF model and recognized to the labeled corpus, and the sentences that are incorrectly recognized by the model are provided with human intervention in the form of the corrected sentences are re-transported to the learning model for training again; after several iterations, the standard geographic named entity dataset and the human extraction model after loop reinforcement are finally formed. The performance of the model is evaluated using the geographic encyclopedia data as an example, and the accuracy of the method is over 90% for most of the geographical named entity recognition parses.

Key words: geographical named entity recognition, human-in-the-loop, deep learning, pre-trained models, BERT-BiLSTM-CRF

CLC Number: