测绘通报 ›› 2018, Vol. 0 ›› Issue (2): 94-98.doi: 10.13474/j.cnki.11-2246.2018.0051

• 行业观察 • 上一篇    下一篇

利用搜索引擎数据模拟疾病空间分布

肖屹1, 何宗宜1, 苗静2, 潘峰1,3, 杨好1   

  1. 1. 武汉大学资源与环境科学学院, 湖北 武汉 430079;
    2. 武汉市测绘研究院, 湖北 武汉 430022;
    3. 西安测绘总站, 陕西 西安 710054
  • 收稿日期:2017-07-18 修回日期:2017-08-30 出版日期:2018-02-25 发布日期:2018-03-06
  • 作者简介:肖屹(1994-),男,硕士生,主要研究方向为空间统计与空间数据挖掘。E-mail:arsrvp@foxmail.com
  • 基金资助:

    国家自然科学基金(41071290);教育部人文社会科学研究项目(14YJCZH028)

Modelling the Spatial Distribution of Epidemic by Search Engine Data

XIAO Yi1, HE Zongyi1, MIAO Jing2, PAN Feng1,3, YANG Hao1   

  1. 1. School of Resource and Environmental Science, Wuhan University, Wuhan 430079, China;
    2. Wuhan Geomatic Institute, Wuhan 430022, China;
    3. Xi'an Information Technique Institute of Surveying and Mapping, Xi'an 710054, China
  • Received:2017-07-18 Revised:2017-08-30 Online:2018-02-25 Published:2018-03-06

摘要:

互联网记录了人们的日常生活,对带有位置信息的搜索引擎数据进行分析和挖掘可以获得隐藏于其中的地理信息。本文通过分析中国各省流感月度发病数与相关关键词百度搜索指数之间的相关性,选取相关性较高关键词的百度指数作为解释变量,发病数作为因变量,在采用主成分分析法消除变量共线性后,分别使用普通最小二乘回归(OLS)、地理加权回归(GWR)及时空地理加权回归(GTWR)构建流感发病数的空间分布模型。模型的拟合度能够从OLS的0.737、GWR的0.915提高到GTWR的0.959,赤池信息准则(AIC)也表明,GTWR模型明显优于OLS与GWR模型。验证结果显示,GTWR模型能准确识别流感高发地区,将该方法与搜索引擎数据结合能较好地模拟流感空间分布,为空间流行病学的研究提供预测模型和统计解释。

关键词: 时空地理加权回归模型, 搜索引擎数据, 流感, 空间分布模型

Abstract:

The Internet records people's daily life,and the analyzing and mining of search engine query data with location can discover valuable geographic information hidden in it.In this paper,the correlation between the monthly influenza case data in each Chinese province and the Baidu search index of related keywords were calculated,the most related keyword's index was chosen as the explanatory variable while the influenza case data was chosen as the dependent variable.The principal component analysis was used to eliminate the effect of multicollinearity among variables before the spatial distribution model of influenza was constructed by ordinary least squares regression (OLS),geographically weighted regression (GWR) and geographically and temporally weighted regression (GTWR).The GTWR model demonstrated a better goodness-of-fit (0.959) than the OLS (0.737) and GWR model (0.915).The Akaike information criterion (AIC) test also supported that the improvement made by GTWR over OLS and GWR models were statistically significant.Validation results showed that the GTWR model can accurately identify the high prevalence area of influenza.It demonstrates that combining the GTWR model with search engine query data can model the spatial distribution of influenza accurately,and provide a prediction model and statistical explanation for the study of epidemiology.

Key words: geographically and temporally weighted regression, search engine data, influenza, spatial distribution model

中图分类号: