Original Article: Jayson S. Jia, Xin Lu, Yun Yuan, Ge Xu, Jianmin Jia & Nicholas A. Christakis. Population flow drives spatio-temporal distribution of COVID-19 in China. Nature. 29 April 2020.
Author of summary: Francesco Trapani; Reviewer: Sonia Fanelli
In this work, an epidemiological model based on mobile-phone data of people transiting through Wuhan (11,478,484 people from 1st to 24th January 2020) is used to predict the epidemic spread of COVID 19 in China. The distribution of population outflow from Wuhan accurately forecasts the relative frequency and geographic distribution of COVID-19 infections through February 19, 2020, across all of China. Such mathematical models, fitted on mobility data, are shown to represent an effective tool to identify high-transmission-risk zones.
Nationwide mobile phone data were used to track population outflow from Wuhan to 296 different prefectures in 31 provinces in the period antecedent to Wuhan quarantine (January 23, 2020). In total, this dataset accounts for 11,478,484 counts of movements from Wuhan. More specifically, the authors considered and focused on the aggregate population flow, i.e. the total aggregate count of people entering any given prefecture from Wuhan during the whole observation period.
First, the efficacy of the quarantine was assessed. The authors reported:
- 52% drop in inter-provincial population ouflow between January 22 and 23,
- 38% drop in intra-provincial population ouflow between January 22 and 23,
- 94% drop in inter-provincial population ouflow between January 23 and 24,
- 84% drop in intra-provincial population ouflow
between January 23 and 24.
With the imposition of the quarantine, population outflow from Wuhan almost completely stopped (the average daily outflow thereafter was 1,087 people to all prefectures outside Hubei).
The population flow dataset was then combined with the count and geographical location of COVID-19 confirmed cases nationwide. It is shown that the cumulative number of infections is highly correlated with aggregate population outflow from Wuhan from January 1 to 24. This correlation increases over time:
- r = 0.522 on January 24
- r = 0.919 on February 5
- r = 0.952 on February 19
Two different models (one cross-sectional, one dynamical) were trained to evaluate the extent to which aggregate population outflow from Wuhan predicts the distribution of COVID-19 infections across China, given the population flow data. They developed the so-called “risk source” model. The cross-sectional model forecasts the infection distribution at a given moment in time (using the count of infections on a given day), while the dynamical model predicts the whole temporal trend of the epidemic.
- The cross-sectional model predicted the epidemic distribution of January 24th with a coefficient of determination R2 = 0.772, and R2 = 0.946 on February 19th;
- The dynamic model predicts the temporal trend of the epidemic distribution (January 1st to January 24th) with a coefficient of determinationR2 = 0.927.
The authors also considered variants of these models with additional independent variables: population size and GDP (a gravity score). These additional variables slightly improve fit (from R2= 0.927 to 0.957 for the dynamical model). Still, the population flow parameter becomes increasingly dominant, while the other parameters become increasingly less predictive over time.
Overall, the models’ performance continuously improved as more infection cases were confirmed, suggesting that the spreading pattern of the virus gradually converges to the distribution of the population outflow from Wuhan to other prefectures in China.
The population outflow from Wuhan is shown to be an effective predictor to estimate the spatio-temporal distribution of the COVID-19 epidemic spread. The authors presented some epidemiological models that can effectively predict the infection distribution across China, given population flow data. Moreover, they suggest these models to be used to estimate a total transmission risk index for each prefecture. Comparing these estimates with the real data (i. e. the confirmed cases) could help identifying the prefectures that are overperforming (maybe due to highly successful public health measures) or underperforming (for instance, because of a higher community transmission).