Lockdown Hack interlude: Correlating mobility stats & pandemic growth

I crunched some data related to the COVID-19 pandemic, following an impulse after I came across two separate news items:

Descartes Labs released a new set of mobility stats, based on mobile phone data aggregated at the US county level (full disclosure: I used to work for Descartes Labs).
On Twitter I saw a pre-print paper by a group of epidemiologists: Estimating the effect of physical distancing on the COVID-19 pandemic using an urban mobility index (Jean-Paul R. Soucy, Shelby L. Sturrock, Isha Berry, Nick Daneman, Derek R. MacFadden, Kevin A. Brown)

The paper uses the Citymapper Mobility Index which is being published for 41 major cities across the world. The Descartes Labs data contains a very similar mobility index but it’s limited to the US and a lot denser, covering individual US counties. (Both datasets likely have problematic biases since they come from specific populations of mobile phone users but that’s a discussion for another day.)

I’m no epidemiologist so I can’t judge the merit of the paper but it did give me a glimpse into how an epidemiologist might think about putting these kinds of datasets together. So I set out to recreate this figure from the paper using the Descartes Labs data:

Screenshot 2020-04-11 at 16.40.13

I used Python with pandas, and the New York Times as a source for COVID-19 data at the county level. The full code is available in this Jupyter notebook. In the resulting graph the dots now represent counties of New York state with at least 20 cumulative confirmed cases of COVID-19 by March 29 (excluding New York City counties), while the axes correspond directly to the original figure:

us-counties

At least the trend looks right.

In the notebook I included a lot of intermediate results to make it easy to follow along but stripped down the code is quite compact, an awesome demonstration of the power of pandas and the SciPy ecoystem in general:

	# This code released under an MIT license. In notebook form with more explanation:
	# https://github.com/nikhaldi/covid-notebooks/blob/master/Mobility%20vs%20COVID-19%20growth%20plot.ipynb
	import numpy as np
	import pandas as pd

	us_counties = pd.read_csv(
	"https://raw.githubusercontent.com/nytimes/covid-19-data/master/us-counties.csv",
	dtype={"fips": str},
	parse_dates=["date"]
	)
	mobility = pd.read_csv(
	"https://raw.githubusercontent.com/descarteslabs/DL-COVID-19/master/DL-us-mobility-daterow.csv",
	dtype={'fips': str},
	parse_dates=['date']
	)

	us_counties_march_week4 = us_counties.set_index('date').loc["2020-03-23":"2020-03-29"]
	mobility_march_week2 = mobility.set_index("date").loc["2020-03-09":"2020-03-15"]
	us_counties_march_week4_ny = us_counties_march_week4[(us_counties_march_week4["state"] == "New York")]
	mobility_march_week2_ny = mobility_march_week2[
	(mobility_march_week2["admin_level"] == 2) & (mobility_march_week2["admin1"] == "New York")
	]

	def mean_daily_growth(series):
	return np.mean(series / series.shift(1) – 1)

	us_counties_march_week4_ny_mean_growth = us_counties_march_week4_ny[["fips", "cases"]].groupby("fips").aggregate(
	max_cases=pd.NamedAgg(column="cases", aggfunc=np.max),
	mean_daily_growth=pd.NamedAgg(column="cases", aggfunc=mean_daily_growth)
	)
	mobility_march_week2_ny_mean = mobility_march_week2_ny.groupby("fips").mean()

	min_cases_threshold = 20
	merged = pd.merge(
	us_counties_march_week4_ny_mean_growth[
	us_counties_march_week4_ny_mean_growth["max_cases"] >= min_cases_threshold
	],
	mobility_march_week2_ny_mean["m50_index"],
	on="fips"
	)

	slope, intercept = np.polyfit(merged.m50_index, np.log(merged.mean_daily_growth), deg=1)
	y_log_estimated = slope * merged.m50_index + intercept

	ax = merged.plot.scatter(x="m50_index", y="mean_daily_growth", figsize=(10,10))
	ax.plot(merged.m50_index, np.exp(y_log_estimated))

view raw

mobility-covid-growth.py

hosted with ❤ by GitHub

Even though I have experience with numpy and other parts of the SciPy ecosystem, I hadn’t used pandas before. Another motivation for this mini-project was to apply pandas to something practical. The bulk of this work was done within 3 hours.

This isn’t serious science of course – I’m not a scientist (but some of my best friends are!). It’s a proof of concept, proving to myself that I understand these datasets and that I roughly understand what was happening in that paper. I’m now thinking about how to build more serious tools around this idea of correlating physical distancing measures and COVID-19 outcomes.

Salt Mines

Lockdown Hack interlude: Correlating mobility stats & pandemic growth

One thought on “Lockdown Hack interlude: Correlating mobility stats & pandemic growth”

Leave a comment Cancel reply

Share this:

Related

One thought on “Lockdown Hack interlude: Correlating mobility stats & pandemic growth”

Leave a comment Cancel reply