测绘通报 ›› 2023, Vol. 0 ›› Issue (8): 155-160,177.doi: 10.13474/j.cnki.11-2246.2023.0250

• 技术交流 • 上一篇    下一篇

人在回路学习增强的地理命名实体识别

杨盈1, 邱芹军2,3,4, 谢忠3,4, 田苗5, 郑诗语3,4, 郑帅5   

  1. 1. 深圳市规划和自然资源数据管理中心, 广东 深圳 518000;
    2. 自然资源部城市国土资源监测与仿真重点实验室, 广东 深圳 518034;
    3. 中国地质大学(武汉)计算机学院, 湖北 武汉 430074;
    4. 地理信息系统国家地方联合工程实验室, 湖北 武汉 430074;
    5. 三峡大学计算机与信息学院, 湖北 宜昌 443002
  • 收稿日期:2023-05-13 发布日期:2023-09-01
  • 通讯作者: 邱芹军。E-mail:qiuqinjun@cug.edu.cn
  • 作者简介:杨盈(1974-),女,高级工程师,主要从事地理信息系统、遥感、电子政务、数字城市等研究工作。E-mail:szyy1205@163.com
  • 基金资助:
    国家重点研发计划(2022YFB3904200;2022YFF0711601);国家自然科学基金原创探索计划项目(42050101);湖北省自然科学基金(2022CFB640);自然资源部城市国土资源监测与仿真重点实验室开放基金(KF-2022-07-014)

Geographical named entity recognition based on human-in-the-loop learning enhancement

YANG Ying1, QIU Qinjun2,3,4, XIE Zhong3,4, TIAN Miao5, ZHENG Shiyu3,4, ZHENG Shuai5   

  1. 1. Shenzhen Data Management Center of Planning and Natural Resources, Shenzhen 518000, China;
    2. Key Laboratory of Urban Land Resources Monitoring and Simulation, Ministry of Natural Resources, Shenzhen 518034, China;
    3. School of Computer Science, China University of Geosciences, Wuhan 430074, China;
    4. National Local Joint Engineering Laboratory of Geographic Information System, Wuhan 430074, China;
    5. College of Computer and Information Technology, China Three Gorges University, Yichang 443002, China
  • Received:2023-05-13 Published:2023-09-01

摘要: 地理命名实体识别是高质量地理知识图谱构建的重要环节,被广泛应用于地理编码、语义检索及地理知识推理等方面。主流的深度学习模型存在标注语料库耗时费力、模型可解释性差等问题。为发挥人在回路机制推动学习模型利用少量样本学习的优势,本文提出了一种人在回路学习增强的地理命名实体识别方法。即以部分标注及未标注地理语料为输入,基于BERT-BiLSTM-CRF模型进行训练并对待标注语料库进行识别,对于模型识别错误的句子提供人工干预形式对其进行纠正,并将纠正之后的句子重新输送到学习模型中进行迭代训练,最终形成标准地理命名实体数据集及人在回路强化后的抽取模型。以地理大百科全书数据为例进行模型性能评估,该方法对于多数地理命名实体识别解析准确率达90%以上,相比已有深度学习模型,该方法仅需要少量标注样本且识别效果更优,对多种地理命名实体识别类型能够保持较好性能。

关键词: 地理命名实体识别, 人在回路, 深度学习, 预训练模型, BERT-BiLSTM-CRF

Abstract: Geographical named entity recognition is an important part of high-quality geographic knowledge graph construction, which is widely used in geographic coding, semantic retrieval and geographic knowledge inference. The mainstream deep learning models suffer from the problems of time-consuming and laborious annotation corpus and poor model interpretability. In order to take advantage of the human-in-the-loop mechanism to promote learning models using a small number of samples, a geographical named entity recognition method based on human-in-the-loop learning enhancement is proposed: partially labeled and unlabeled geographic corpus is used as input, trained based on BERT-BiLSTM-CRF model and recognized to the labeled corpus, and the sentences that are incorrectly recognized by the model are provided with human intervention in the form of the corrected sentences are re-transported to the learning model for training again; after several iterations, the standard geographic named entity dataset and the human extraction model after loop reinforcement are finally formed. The performance of the model is evaluated using the geographic encyclopedia data as an example, and the accuracy of the method is over 90% for most of the geographical named entity recognition parses.

Key words: geographical named entity recognition, human-in-the-loop, deep learning, pre-trained models, BERT-BiLSTM-CRF

中图分类号: