Lockdown Hack #5: Personal YouTube Charts

(About Lockdown Hacks)

webstore-screenshot

Where

In the Chrome webstore
Plus: Source code on GitHub

Stack

Chrome extension, jQuery, unofficial YouTube APIs

Notes

This is a YouTube feature I’ve been wanting for a while: I know there are videos, especially music videos, that I return to over and over again, and I would like to see which ones I’ve viewed the most. YouTube has a watch history but doesn’t offer a tally.

So I wanted to bolt on that feature. The YouTube Data API unfortunately does not grant access to the watch history – it was removed from the API at some point, maybe because that information is way more sensitive to privacy issues than anything else in the API? My best bet was creating a Chrome extension then which suited me well enough. I wanted to learn more about extensions anyway.

Chrome extensions are based on the time-tested combination of HTML, CSS & JS, with some specific JS APIs thrown in. I appreciated how this lowered the barrier of entry for somebody familiar with web development. The programming model is awkward though for the same reason that it is for many of these types of plugin/extension frameworks. There isn’t a real development environment – an IDE, so to speak. Notice, for example, how the developer documentation is silent on the topic of automated testing. It’s easy enough to load an extension locally in Chrome but then development becomes a tedious cycle of code editing, reloading the extension and manual testing.

My main challenge was YouTube itself anyway. Chrome extensions let you run your JS in the context of an existing page – such as the YouTube watch history page – which gives you access to the DOM, safe XHR requests to the page’s origin, and even global JS variables (not a documented feature – but the hacks work). Originally I thought I was going to scan the history via the DOM but I had to give up on that. The DOM is dauntingly complex, using a ton of web components, and scrolling down far enough to capture a sizable chunk of the history brought even my high-end MacBook to a crawl. So instead I reverse engineered the XHR requests that the page makes to browse through the history.

I decided to use jQuery because I couldn’t stomach writing raw JS. With more time and patience I would have tried setting up a more sophisticated environment with package management and something like webpack. Rest assured I would never use jQuery in a project not explicitly designated as a hack!

jQuery spaghetti notwithstanding, the result seemingly works quite well. Of course it depends on undocumented YouTube APIs and UI elements, so it will be correspondingly brittle. I released the source code on GitHub in the (probably futile) hope that, if the extension does need maintenance to keep it compatible as YouTube makes changes, somebody might be able to contribute to it.

(Failed) Lockdown Hack #4: Prontissimo

(About Lockdown Hacks)

Screenshot 2020-04-23 at 15.05.25

Stack

Amazon Product Advertising API, Ruby on Rails, Vue.js

Notes

Not burying the lede here, I failed to achieve what I set out to do with this one. (The screenshot above only reflects how far I got in terms of the frontend – yes, I spent 30 minutes designing a logo, what of it?)

I still think the idea was pretty good: Hack Amazon search to sort results by fastest delivery. Amazon does not let you sort this way, and I think I understand why. It wouldn’t normally be a useful feature because you can just filter by Prime eligibility which would indicate a one- or two-day delivery. Furthermore, delivery estimates for non-Prime items where Amazon doesn’t tightly control the supply chain are probably unreliable. But during this pandemic I noticed that Prime delivery has been all over the place here in the UK, sometimes weeks out. When online shopping is the most convenient or the only way to procure certain stuff, you do sometimes care intensely to pick the item that will be delivered the fastest.

I skimmed the documentation of Amazon’s Product Advertising API and decided I would be able to cobble together at least a proof of concept. Then over the next day and a half I faced enough hurdles that I eventually gave up. There were two major issues: Getting access to Amazon’s Product Advertising API is an involved process, and the API does not return the kind of data (delivery estimates) I needed, as far as I can tell.

I had skipped over the part in the documentation where it explains that to use the API you need to be an Amazon Associate. Not only that, you need to have made at least 3 qualified sales through the associate program and need “final acceptance” which involves human review. I did sign up as an associate in the US and in the UK and even managed to broker 3 sales in the US (with a little help from my friends). But the review process is opaque enough that I suspect it will take days to get final acceptance. I did contact the US and UK associate programs separately about fast-tracking but the replies did not give me confidence the rules are bendable in any way.

Because I hadn’t been able to try out real API requests and responses it also took me too long to realize that the API doesn’t provide the delivery estimates I needed (I think). This is the major takeaway for me from this failed hack: Good documentation is clearly a prerequisite for a successful API but it’s just as important to make it trivial for users to start exercising it with real data, even if it’s under restricted trial conditions like for example Twilio does it. It’s hard for a developer to project what an API will enable them to do by just looking at documentation.

I understand Amazon might have valid strategic reasons for restricting API access. The API has some potential as an abuse vector where somebody could recreate, for example, the Amazon homepage in a way that undermines the Amazon brand. But then again, don’t the tight rate limits – 1 request per second to start with – protect against most abuse? My product intuition is that the upside of removing barriers to entry trumps potential downsides here.

Lockdown Hack #3: Whence

(About Lockdown Hacks)

Screenshot 2020-04-20 at 20.12.29

Where

https://nikhaldimann.com/whence

What

A game of guessing random locations on Earth based only on satellite views

Stack

Google Maps JS API, Vue.js

Notes

Perhaps obviously, the popular game GeoGuessr inspired this hack, where you move around Street View to guess your location. It’s fun enough that they managed to spin it into a whole franchise, complete with a paywall and professional leagues. I remember when it was a simple free game, years ago.

Having worked a lot with satellite imagery over the last years, I had carried around this idea for a long time – applying a similar game mechanic to just satellite views rather than Street View. Scoring is based on how close your guess is and how many times you zoomed out.

During my brief research into similar games I also found the Map Zoom Quiz. In my view they made a fatal design decision to have players type in their answer. That makes it too hard, plus reaching for the keyboard in a game like this just feels wrong. Whence is still hard but you do get points for making an educated guess. I also observed I got better at it over time as I indeed learned to differentiate the world’s terrains more precisely and to latch onto specific clues (agricultural landscape looks very different in Asia vs North America!).

I didn’t face any real technical challenge during this – roughly two days of programming and game design, then one day of UI design and polish. The Google Maps API is mature and rich yet simple, and it was really comfortable to use with an appropriate Vue wrapper. I think I spent a disproportionate time on UI design and game design, both disciplines out of my comfort zone of course. I’m still not satisfied with the UI. It’s particularly awkward on small screens but I got tired of fiddling with it.

How does this hack tie into the lockdown condition? Just think of it as virtual travel, since real travel hasn’t been in the cards for most of the world recently. But really, it was just an idea that I had been stuck in my head for quite some time and I needed to scratch the itch.

Lockdown Hack #2: Towards proving that social distancing works

(About Lockdown Hacks)

How effective is social distancing at combating the spread of COVID-19? I explained in my previous post how I was inspired to look at correlating mobility data with COVID-19 growth to begin to answer that question. How much people move around is a decent proxy for how strictly social distancing rules are being observed and we can exploit differences in mobility between regions to evaluate the impact of social distancing.

This is still me riffing on a paper by actual epidemiologists I came across (Estimating the effect of physical distancing on the COVID-19 pandemic using an urban mobility indexJean-Paul R. SoucyShelby L. SturrockIsha BerryNick DanemanDerek R. MacFaddenKevin A. Brown). I have to stress once more that I’m not a scientist, much less an epidemiologist. I have experience building tools for scientists though, so I decided to contribute dashboards that maybe would allow an expert to explore the connection between mobility and COVID-19 spread.

There are lots of tools out there for creating interactive dashboards and even just in the interest of rapid progress I wanted to use them rather than build something custom from scratch. I settled on three fundamentally different tools: Google Spreadsheets as a traditional spreadsheet application, Tableau as the most prominent business intelligence application and Jupyter Notebooks with widgets as an interactive programming environment widely used by scientists.

Yes, I built essentially the same thing three times because I suspected I could bang these out quickly. I did not know ahead of time which approach was going to work best or which one would be most practical for others to build on – I had never used Tableau or Jupyter widgets before.

The requirements for each dashboard were:

  • Import the Descartes Labs mobility and New York Times COVID-19 data, preferably in realtime
  • Draw a scatter plot of mobility vs pandemic growth by US county, and make it as interactive as possible
  • Make the result available in a way that allows other people to easily use it and build on it

Google Spreadsheets

View the spreadsheet

Screenshot 2020-04-14 at 16.46.52

Working on this was like a warm bath of nostalgia because I worked on this product at Google from 2010 to 2012. I hadn’t done anything so fancy in a spreadsheet for a while but this came together quickly and intuitively.

However, Google Spreadsheets is a collaborative spreadsheet editing application – it’s not designed for interactive dashboards that multiple people can use at the same time. The shared spreadsheet is view-only; to interact with the “dashboard” you need to create your own copy of the document (and need to be signed into a Google account).

I also found out that the raw data CSV files are too big to import dynamically from a URL, so sadly the raw data in the spreadsheet is static and I will have to update it manually once in a while.

Tableau

View the dashboard

Screenshot 2020-04-14 at 16.51.05

As a complete Tableau newbie this was rough going. I’m used to thinking about data in relational terms (tables, aggregations, joins) but Tableau has invented its own universe of concepts that only vaguely map to relational concepts. I can’t say I understand their choices from either a product point of view or a technical one. In the beginning I spent way too much time struggling with idiosyncratic concepts, and there exists now half a century of literature about relational-style backends, APIs and UIs. Google Spreadsheets manages to implement a full relational algebra in a browser. What’s your excuse, Tableau?

I struggled the hardest exactly at the edges of Tableau’s concept of relations where I needed to aggregate and join the two raw data sources. I realized Tableau is indeed very much a visualization tool, not a number crunching tool. I think in a real project I am meant to pre-aggregate the data either myself or in an add-on called Tableau Prep. In order to keep the workbook as self-contained as possible I stuck with ingesting the raw data and accepted the limitations, mainly that I had to fix the date ranges for the mean mobility index and mean pandemic growth rate. (I may be missing or misunderstanding features, any suggestions for improvements are very welcome.)

On the upside, it was easy to build the interactive features of the dashboard and it was trivial to publish and share it through Tableau Public.

Jupyter Widgets

Launch the notebook in Binder (may be slow to start up – run all cells to get the interactive plot at the bottom)
Or view the source (not interactive and does not show the plot)

Screenshot 2020-04-14 at 16.55.31

There are good reasons why Jupyter Notebooks with Python have become the de facto standard programming environment in certain science communities. They are well suited to the kind of explorative, iterative programming that is often at the core of teasing insights out of data. To me, as a veteran Python programmer, out of the three approaches this one felt the most extensible and powerful.

But I have mixed feelings about this solution. The power unlocked by notebooks also makes this “dashboard” intimidating or plain unusable to a general audience. I doubt that somebody who is not a programmer or has never seen a running notebook can figure out what to do when they launch it.

I’ve also held for a long time that the notebook model has some inherent flaws. The statefulness of the runtime makes collaborating on or sharing a notebook fundamentally tricky. I appreciate an effort like Binder, which gives you an ad-hoc runtime for notebooks and which works nicely for my purposes here, but collaboration, sharing and version control still look to me like largely unsolved problems in the Jupyter ecosystem. Not to go on too wide a tangent but I have wondered if Jupyter people have huddled on these topics with Smalltalk people who have some decades of experience with very stateful programming environments.

Conclusion

Ultimately, none of the dashboarding tools I explored were able to completely fulfil all requirements, not in the short time available anyway. On the other hand, I did manage to prototype a usable dashboard in each of the tools in the space of less than three days total. I can’t tell which of these might be most useful to a casual or expert user – let’s see what sticks.

Any feedback, comments or suggestions for improvements? Any other tools I should have explored? Let me know.

Lockdown Hack interlude: Correlating mobility stats & pandemic growth

I crunched some data related to the COVID-19 pandemic, following an impulse after I came across two separate news items:

The paper uses the Citymapper Mobility Index which is being published for 41 major cities across the world. The Descartes Labs data contains a very similar mobility index but it’s limited to the US and a lot denser, covering individual US counties. (Both datasets likely have problematic biases since they come from specific populations of mobile phone users but that’s a discussion for another day.)

I’m no epidemiologist so I can’t judge the merit of the paper but it did give me a glimpse into how an epidemiologist might think about putting these kinds of datasets together. So I set out to recreate this figure from the paper using the Descartes Labs data:

Screenshot 2020-04-11 at 16.40.13

I used Python with pandas, and the New York Times as a source for COVID-19 data at the county level. The full code is available in this Jupyter notebook. In the resulting graph the dots now represent counties of New York state with at least 20 cumulative confirmed cases of COVID-19 by March 29 (excluding New York City counties), while the axes correspond directly to the original figure:

us-counties

At least the trend looks right.

In the notebook I included a lot of intermediate results to make it easy to follow along but stripped down the code is quite compact, an awesome demonstration of the power of pandas and the SciPy ecoystem in general:


# This code released under an MIT license. In notebook form with more explanation:
# https://github.com/nikhaldi/covid-notebooks/blob/master/Mobility%20vs%20COVID-19%20growth%20plot.ipynb
import numpy as np
import pandas as pd
us_counties = pd.read_csv(
"https://raw.githubusercontent.com/nytimes/covid-19-data/master/us-counties.csv",
dtype={"fips": str},
parse_dates=["date"]
)
mobility = pd.read_csv(
"https://raw.githubusercontent.com/descarteslabs/DL-COVID-19/master/DL-us-mobility-daterow.csv",
dtype={'fips': str},
parse_dates=['date']
)
us_counties_march_week4 = us_counties.set_index('date').loc["2020-03-23":"2020-03-29"]
mobility_march_week2 = mobility.set_index("date").loc["2020-03-09":"2020-03-15"]
us_counties_march_week4_ny = us_counties_march_week4[(us_counties_march_week4["state"] == "New York")]
mobility_march_week2_ny = mobility_march_week2[
(mobility_march_week2["admin_level"] == 2) & (mobility_march_week2["admin1"] == "New York")
]
def mean_daily_growth(series):
return np.mean(series / series.shift(1) 1)
us_counties_march_week4_ny_mean_growth = us_counties_march_week4_ny[["fips", "cases"]].groupby("fips").aggregate(
max_cases=pd.NamedAgg(column="cases", aggfunc=np.max),
mean_daily_growth=pd.NamedAgg(column="cases", aggfunc=mean_daily_growth)
)
mobility_march_week2_ny_mean = mobility_march_week2_ny.groupby("fips").mean()
min_cases_threshold = 20
merged = pd.merge(
us_counties_march_week4_ny_mean_growth[
us_counties_march_week4_ny_mean_growth["max_cases"] >= min_cases_threshold
],
mobility_march_week2_ny_mean["m50_index"],
on="fips"
)
slope, intercept = np.polyfit(merged.m50_index, np.log(merged.mean_daily_growth), deg=1)
y_log_estimated = slope * merged.m50_index + intercept
ax = merged.plot.scatter(x="m50_index", y="mean_daily_growth", figsize=(10,10))
ax.plot(merged.m50_index, np.exp(y_log_estimated))

Even though I have experience with numpy and other parts of the SciPy ecosystem, I hadn’t used pandas before. Another motivation for this mini-project was to apply pandas to something practical. The bulk of this work was done within 3 hours.

This isn’t serious science of course – I’m not a scientist (but some of my best friends are!). It’s a proof of concept, proving to myself that I understand these datasets and that I roughly understand what was happening in that paper. I’m now thinking about how to build more serious tools around this idea of correlating physical distancing measures and COVID-19 outcomes.

Lockdown Hack #1: Hugster

Screenshot 2020-04-09 at 17.56.43

Where

https://nikhaldimann.com/hugster

What

Get gentle reminders by text to reach out to friends while social distancing.

Stack

Ruby on Rails, Vue.js, BootstrapVue, Twilio

Timeline

It took me about 3 days to arrive at the feature complete version as it is now. At that point it looked like this though:

Screenshot 2020-04-09 at 09.46.57

All the interactivity was in place and the UI elements were roughly lined up where I wanted them but I hadn’t spent any time on styling or visual considerations. I spent almost another full day just on UI/UX polishing. I know I said I wouldn’t mind unpolished outcomes but in this case I really felt the impact of decent design would be significant.

I can’t overstate how much BootstrapVue helped with the UX, even when I hadn’t ever used it before. It accounts for the somewhat cookie cutter look & feel but in earlier times it would have taken me days to recreate its prefab components (tab navigation, form input, modals etc.) with similar fidelity.

Caveats

The biggest caveat from a product point of view is that you have to enter your contacts manually. It’s an obvious feature to import contacts from Google as users are already authenticated against it. Implementing that and dealing with Google’s restrictions around it would be another multi-day project though.

Arguably this should be a mobile app which can tightly integrate with the contact data on users’ phone and use native push notifications rather than texts. I ruled that out for practical reasons: I estimate it would take me more than a week of scrambling to learn React Native while putting a mobile app together and then I would still be faced with Apple’s app store barriers, delaying an iOS launch even further. Mobile platforms are not great for rapid prototyping, especially if, like me, you haven’t built a mobile app in years.

Easy parts

Twilio handles all the sending of text messages. I’m only using a single API call of theirs – sending a text message – so it should be easy, but their sign up flow and developer experience was slicker than I expected. Total time spent on the Twilio integration was probably less than 1 hour.

Hard parts

I’ve known this for a long time but the point was definitely driven home: Frontend development is a lot more time consuming than backend development for something like this where scalability on the backend is not a concern. I would say the split of my time was about 70% frontend, 30% backend.

This UI is about as simple as it can get but the number of states that the UI needs to account for is still kinda too high to hold in my head at once. This means countless loops of manual testing and a lot of code for what looks trivial on the surface.

I don’t think this is any fault of the tooling (the Vue.js developer experience has been very nice), it’s a systemic issue with UI development. Effort doesn’t scale linearly as you ramp up the interactivity (reactivity?) of the UI, it’s more like quadratic. I think people, including me, often underestimate the effort required to build a modern web frontend because that ramp has been steep lately – what’s technically possible and what users expect has evolved so rapidly.

Lockdown Hacks

The COVID-19 pandemic began to seriously affect life in London just a couple of weeks after I had moved here and started a new job. I had deliberately chosen an employer with a business model rooted in physical space – in real estate – since I had tired a bit of businesses that are, let’s say, mainly virtual.

You can guess what happened next. Virtuality is an advantage in a worldwide pandemic, and real estate business has collapsed for the time being. So less than a month into the new job I was laid off. No hard feelings from my side – I would have let myself go too. My skill set mainly pays off in a growth phase, not in a hunker-down phase.

This marks week four under lockdown and social distancing conditions in London, meaning I have not been leaving the flat other than for grocery shopping or for occasional exercise. Considering the world’s ever gloomier backdrop I’m doing fine. I’m healthy. I’m not lacking anything (other than social contact!). I’m quite good at self-imposing structure on lots of free time. I’m sticking to a lean media diet to keep anxiety at bay. Most importantly, I will probably be employed again soon enough.

I stewed for a couple of days after my abrupt layoff but then a plan presented itself as obvious in my head: I will spend the next few lockdown weeks working on a series of hacks with some connection to the pandemic or the lockdown/quarantine conditions. I will pick ideas off the top of my head and see how far I can drive them with my software generalist expertise.

I don’t know if this will result in anything immediately useful. Some hacks will turn out unpolished or silly or they will fail completely, and that’s ok! The journey is the destination. What matters are the meta-goals:

  1. To flex my mind while stuck indoors without a real job.
  2. To try out some new languages, frameworks and tools.
  3. To plant seeds for potential projects beyond the pandemic.
  4. To bait potential collaborators, co-founders, employers.
  5. To relive a happy part of my youth, before I had all-consuming full-time jobs, when this mode of open-ended play constituted a large part of my life.

That last point is probably the most important one, to be honest. I miss the pleasure of just following where my mind is leading me, somewhat unfiltered, free from the constraints of long-term maintainability. I’m very good at building maintainable, scalable software over a long period of time in a collaborative manner – but this break in the world calls for some solo rapid prototyping.

I’m giving myself about 3 days to explore any given idea before I move on to the next. I will publish the results as I go and document them on this blog. Watch this space.