测绘通报 ›› 2024, Vol. 0 ›› Issue (9): 129-134.doi: 10.13474/j.cnki.11-2246.2024.0923

• 学术研究 • 上一篇    下一篇

多维特征学习与模型融合的地理命名实体识别

马浩然, 王金华   

  1. 中国电子科技集团公司第三十二研究所, 上海 201808
  • 收稿日期:2024-07-08 发布日期:2024-10-09
  • 通讯作者: 王金华。E-mail:15802196002@139.com
  • 作者简介:马浩然(1998—),男,硕士生,主要研究方向为人工智能、知识图谱、大数据等方向。E-mail:2730173394@qq.com

Geographic named entity recognition based on multi-dimensional feature learning and model fusion

MA Haoran, WANG Jinhua   

  1. The 32nd Research Institute of China Electronics Technology Group Corporation, Shanghai 201808, China
  • Received:2024-07-08 Published:2024-10-09

摘要: 地理命名实体识别是地理信息抽取的核心任务,而地理信息抽取支撑着地理信息系统的构建。但目前的地理命名实体识别研究面临两大核心挑战:一是地理领域文本的标注数据稀缺,导致传统依赖大量标注数据的通用模型难以全面捕捉并识别地理文本中所有潜在的命名实体;二是地理数据的标签密度较为稀疏,模型在区分不同地理命名实体时往往无法区分其边界,进而无法精准定位。针对上述问题,本文提出了一种面向地理领域文本特征的命名实体识别算法AM_NER。首先,利用Albert进行词向量训练,该模型是面向小样本的轻量级预训练模型,能够更为全面地学习地理领域的语义信息;然后,设计了名为M_NER的神经元结构,该神经元基于模型融合思想,利用多个模型从不同维度对语义特征进行学习,进而准确识别出命名实体的边界。相较于此前的研究,AM_NER在地理领域数据集中的各项指标提升了2.05%~2.67%。

关键词: 地理命名实体识别, 深度学习, 特征学习, 模型融合

Abstract: Geographical named entity recognition is the core task of geographic information extraction, which supports the construction of geographic information systems. However, current research on geographic named entity recognition faces two core challenges: Firstly, the scarcity of annotated data in geographic texts makes it difficult for traditional generic models that rely heavily on annotated data to fully capture and recognize all potential named entities in geographic texts.Secondly, the label density of geographic data is relatively sparse, and models often can not distinguish the boundaries of different geographic named entities, thus unable to accurately locate them. In response to the above issues, this study proposes a named entity recognition algorithm AM-NER for geographic text features. Firstly, using Albert for word vector training, this model is a lightweight pre training model for small samples, which can comprehensively learn semantic information in the geographic field.Secondly, a neuron structure named MNER is designed, which is based on the idea of model fusion and utilizes multiple models to learn semantic features from different dimensions, accurately identifying the boundaries of named entities. Compared to previous studies, AM-NER has improved various indicators in the geographic dataset by 2.05%~2.67%.

Key words: geographical named entity recognition, deep learning, feature learning, model fusion

中图分类号: