Bulletin of Surveying and Mapping ›› 2023, Vol. 0 ›› Issue (6): 155-160.doi: 10.13474/j.cnki.11-2246.2023.0186

Previous Articles     Next Articles

Application of improved random forest model in population spatialization

JIANG Xueli1, XIONG Yongliang1, GUO Hongmei2, ZHAO Zhen2, ZHANG Ying2, MENG Yatian1   

  1. 1. Faculty of Geosciences and Environmental Engineering, Southwest Jiaotong University, Chengdu 611756, China;
    2. Earthquake Administration of Sichuan Province, Chengdu 610041, China
  • Received:2022-07-05 Published:2023-07-05

Abstract: The random forest model-based population spatialization method does not take into account the non-equilibrium of population spatial distribution, and the use of Bootstrap sampling exacerbates the unevenness of the sample, making it unrepresentative and resulting in low model prediction accuracy. For this problem,this study takes Chengdu city as an example, the characteristic factors of affecting the population distribution are extracted through correlation analysis, the data set is clustered based on the K-means++clustering algorithm, and then an equal amount of data from each cluster is fused as a training subset using the Bootstrap sampling method to construct an improved random forest model and compare it with the traditional random forest model. Finally, the population data of Chengdu city in 2020 is spatialized using an improved random forest model, and the results are compared with the WorldPop dataset for accuracy. The results show that the overall accuracy of the population spatialisation model based on the improved random forest reaches 80.5%, which is about 3.4% higher than before the improvement, indicating that the improved random forest model can effectively improve the model prediction accuracy. Compared to the WorldPop dataset, the population spatialisation results based on the improved random forest model are better in terms of fit and accuracy.

Key words: population spatialization, random forest, K-means++ clustering, Chengdu city

CLC Number: