How effective is social distancing at combating the spread of COVID-19? I explained in my previous post how I was inspired to look at correlating mobility data with COVID-19 growth to begin to answer that question. How much people move around is a decent proxy for how strictly social distancing rules are being observed and we can exploit differences in mobility between regions to evaluate the impact of social distancing.
This is still me riffing on a paper by actual epidemiologists I came across (Estimating the effect of physical distancing on the COVID-19 pandemic using an urban mobility index – , , , , , ). I have to stress once more that I’m not a scientist, much less an epidemiologist. I have experience building tools for scientists though, so I decided to contribute dashboards that maybe would allow an expert to explore the connection between mobility and COVID-19 spread.
There are lots of tools out there for creating interactive dashboards and even just in the interest of rapid progress I wanted to use them rather than build something custom from scratch. I settled on three fundamentally different tools: Google Spreadsheets as a traditional spreadsheet application, Tableau as the most prominent business intelligence application and Jupyter Notebooks with widgets as an interactive programming environment widely used by scientists.
Yes, I built essentially the same thing three times because I suspected I could bang these out quickly. I did not know ahead of time which approach was going to work best or which one would be most practical for others to build on – I had never used Tableau or Jupyter widgets before.
The requirements for each dashboard were:
- Import the Descartes Labs mobility and New York Times COVID-19 data, preferably in realtime
- Draw a scatter plot of mobility vs pandemic growth by US county, and make it as interactive as possible
- Make the result available in a way that allows other people to easily use it and build on it
Working on this was like a warm bath of nostalgia because I worked on this product at Google from 2010 to 2012. I hadn’t done anything so fancy in a spreadsheet for a while but this came together quickly and intuitively.
However, Google Spreadsheets is a collaborative spreadsheet editing application – it’s not designed for interactive dashboards that multiple people can use at the same time. The shared spreadsheet is view-only; to interact with the “dashboard” you need to create your own copy of the document (and need to be signed into a Google account).
I also found out that the raw data CSV files are too big to import dynamically from a URL, so sadly the raw data in the spreadsheet is static and I will have to update it manually once in a while.
As a complete Tableau newbie this was rough going. I’m used to thinking about data in relational terms (tables, aggregations, joins) but Tableau has invented its own universe of concepts that only vaguely map to relational concepts. I can’t say I understand their choices from either a product point of view or a technical one. In the beginning I spent way too much time struggling with idiosyncratic concepts, and there exists now half a century of literature about relational-style backends, APIs and UIs. Google Spreadsheets manages to implement a full relational algebra in a browser. What’s your excuse, Tableau?
I struggled the hardest exactly at the edges of Tableau’s concept of relations where I needed to aggregate and join the two raw data sources. I realized Tableau is indeed very much a visualization tool, not a number crunching tool. I think in a real project I am meant to pre-aggregate the data either myself or in an add-on called Tableau Prep. In order to keep the workbook as self-contained as possible I stuck with ingesting the raw data and accepted the limitations, mainly that I had to fix the date ranges for the mean mobility index and mean pandemic growth rate. (I may be missing or misunderstanding features, any suggestions for improvements are very welcome.)
On the upside, it was easy to build the interactive features of the dashboard and it was trivial to publish and share it through Tableau Public.
There are good reasons why Jupyter Notebooks with Python have become the de facto standard programming environment in certain science communities. They are well suited to the kind of explorative, iterative programming that is often at the core of teasing insights out of data. To me, as a veteran Python programmer, out of the three approaches this one felt the most extensible and powerful.
But I have mixed feelings about this solution. The power unlocked by notebooks also makes this “dashboard” intimidating or plain unusable to a general audience. I doubt that somebody who is not a programmer or has never seen a running notebook can figure out what to do when they launch it.
I’ve also held for a long time that the notebook model has some inherent flaws. The statefulness of the runtime makes collaborating on or sharing a notebook fundamentally tricky. I appreciate an effort like Binder, which gives you an ad-hoc runtime for notebooks and which works nicely for my purposes here, but collaboration, sharing and version control still look to me like largely unsolved problems in the Jupyter ecosystem. Not to go on too wide a tangent but I have wondered if Jupyter people have huddled on these topics with Smalltalk people who have some decades of experience with very stateful programming environments.
Ultimately, none of the dashboarding tools I explored were able to completely fulfil all requirements, not in the short time available anyway. On the other hand, I did manage to prototype a usable dashboard in each of the tools in the space of less than three days total. I can’t tell which of these might be most useful to a casual or expert user – let’s see what sticks.
Any feedback, comments or suggestions for improvements? Any other tools I should have explored? Let me know.