测绘通报 ›› 2018, Vol. 0 ›› Issue (2): 94-98.doi: 10.13474/j.cnki.11-2246.2018.0051

Previous Articles     Next Articles

Modelling the Spatial Distribution of Epidemic by Search Engine Data

XIAO Yi1, HE Zongyi1, MIAO Jing2, PAN Feng1,3, YANG Hao1   

  1. 1. School of Resource and Environmental Science, Wuhan University, Wuhan 430079, China;
    2. Wuhan Geomatic Institute, Wuhan 430022, China;
    3. Xi'an Information Technique Institute of Surveying and Mapping, Xi'an 710054, China
  • Received:2017-07-18 Revised:2017-08-30 Online:2018-02-25 Published:2018-03-06

Abstract:

The Internet records people's daily life,and the analyzing and mining of search engine query data with location can discover valuable geographic information hidden in it.In this paper,the correlation between the monthly influenza case data in each Chinese province and the Baidu search index of related keywords were calculated,the most related keyword's index was chosen as the explanatory variable while the influenza case data was chosen as the dependent variable.The principal component analysis was used to eliminate the effect of multicollinearity among variables before the spatial distribution model of influenza was constructed by ordinary least squares regression (OLS),geographically weighted regression (GWR) and geographically and temporally weighted regression (GTWR).The GTWR model demonstrated a better goodness-of-fit (0.959) than the OLS (0.737) and GWR model (0.915).The Akaike information criterion (AIC) test also supported that the improvement made by GTWR over OLS and GWR models were statistically significant.Validation results showed that the GTWR model can accurately identify the high prevalence area of influenza.It demonstrates that combining the GTWR model with search engine query data can model the spatial distribution of influenza accurately,and provide a prediction model and statistical explanation for the study of epidemiology.

Key words: geographically and temporally weighted regression, search engine data, influenza, spatial distribution model

CLC Number: