测绘通报 ›› 2023, Vol. 0 ›› Issue (6): 155-160.doi: 10.13474/j.cnki.11-2246.2023.0186

• 技术交流 • 上一篇    下一篇

改进随机森林模型在人口空间化中的应用

江雪梨1, 熊永良1, 郭红梅2, 赵真2, 张莹2, 孟雅湉1   

  1. 1. 西南交通大学地球科学与环境工程学院, 四川 成都 611756;
    2. 四川省地震局, 四川 成都 610041
  • 收稿日期:2022-07-05 发布日期:2023-07-05
  • 通讯作者: 赵真。E-mail:827387315@qq.com
  • 作者简介:江雪梨(1998-),女,硕士,主要研究方向为人口数据空间化。E-mail:1679685419@qq.com
  • 基金资助:
    国家重点研发计划(2020YFA07106003-07);国家自然科学基金(42061073);四川地震科技创新团队专项(201901)

Application of improved random forest model in population spatialization

JIANG Xueli1, XIONG Yongliang1, GUO Hongmei2, ZHAO Zhen2, ZHANG Ying2, MENG Yatian1   

  1. 1. Faculty of Geosciences and Environmental Engineering, Southwest Jiaotong University, Chengdu 611756, China;
    2. Earthquake Administration of Sichuan Province, Chengdu 610041, China
  • Received:2022-07-05 Published:2023-07-05

摘要: 基于随机森林模型的人口空间化方法未考虑人口空间分布非平衡性,利用Bootstrap采样加剧样本的不均衡性,使其不具有代表性,造成模型预测精度较低。针对此问题,本文以成都市为例,通过相关性分析提取影响人口分布的特征因子,基于K-means++聚类算法对数据集进行聚类处理,然后利用Bootstrap采样法从各簇中抽取等量的数据融合作为训练子集构建改进随机森林模型,并与传统随机森林模型进行对比。运用改进后的随机森林模型对成都市2020年人口数据进行空间化,并与WorldPop数据集进行精度对比。结果表明,基于改进随机森林的人口空间化模型整体精度达80.5%,较改进前提高了约3.4%,有效提高了模型预测精度;相较于WorldPop数据集,基于改进随机森林模型的人口空间化结果在拟合度及精度方面均较优。

关键词: 人口空间化, 随机森林, K-means++聚类, 成都市

Abstract: The random forest model-based population spatialization method does not take into account the non-equilibrium of population spatial distribution, and the use of Bootstrap sampling exacerbates the unevenness of the sample, making it unrepresentative and resulting in low model prediction accuracy. For this problem,this study takes Chengdu city as an example, the characteristic factors of affecting the population distribution are extracted through correlation analysis, the data set is clustered based on the K-means++clustering algorithm, and then an equal amount of data from each cluster is fused as a training subset using the Bootstrap sampling method to construct an improved random forest model and compare it with the traditional random forest model. Finally, the population data of Chengdu city in 2020 is spatialized using an improved random forest model, and the results are compared with the WorldPop dataset for accuracy. The results show that the overall accuracy of the population spatialisation model based on the improved random forest reaches 80.5%, which is about 3.4% higher than before the improvement, indicating that the improved random forest model can effectively improve the model prediction accuracy. Compared to the WorldPop dataset, the population spatialisation results based on the improved random forest model are better in terms of fit and accuracy.

Key words: population spatialization, random forest, K-means++ clustering, Chengdu city

中图分类号: