03 Aug 2021

Valuable Abstractions

A worthy endeavour

Canus lupus, vulpis vulpis

Writing good software is about 90% creating good abstractions. Or at least that is the part that is most fun.

I have worked in a few different domains with wildly different requirements: app features, developer tools, big data frameworks. The most satisfying accomplishment in each domain usually involved creating an abstraction that helped others. I think these were so satisfying because creating a valuable abstraction is hard.

Programmers are clearly addicted to iterating on abstractions, searching for something better, sometimes for decades like the thread concurrency abstraction.

So how do you know when you’ve thought of a good one?

The goal

Abstractions are attempting to hide complexity. If a user can treat a complex system as a black box, without understanding the underlying system, the abstraction is valuable. All that cognitive load can now be used to solve the next challenge. A huge win, but there are a lot of ways to mess it up.

The pitfalls

Abstractions add a new layer of complexity, even if its a layer attempting to hide complexity. If a developer is trying to debug an application they will feel this complexity add up when they have to follow layers of indirection diving into the abstractions. The addition of this complexity needs to be weighed against the complexity the abstraction is trying to hide.

Abstractions are all or nothing. If an abstraction attempts to cover up a lot of ground, the user is implicitly accepting responsibility for the complexity of all that ground. The user might only need a small fraction of what an abstraction provides, but they are on the hook if any part breaks. This trade-off needs to be weighed again to determine if the abstraction is worth it.

And finally, abstractions leak. Even very successful abstractions, like SQL over database implementations, leak implementation characteristics through the abstraction. For SQL, the same query will act a little different depending on which implementation is being used under the hood. This is complexity the user will have to be aware of whether or not they use the abstraction, but leaks can be extra confusing when an abstraction is promising to cover it up.

My fav

The most successful abstraction that I helped create at Fitbit was an algorithm development framework. Before the framework, researchers developed new algorithms on subsets of data on their laptops. These algorithms would then be translated to “production” code and serve insights to Fitbit users.

The researchers preferred working on their laptops for the quicker iterations. They could work on larger servers that could handle 100% of the data, but that introduced distributed computing complexities. The complexities slowed them down way more than the cost of only working on a subset of the data. But then there was the large cost to the company to translate the code to production, where we had to deal with distributed computing. Ideally, the code written by the researchers could be pushed directly to production. We wanted an abstraction which would hide the distributed computing complexities from the researchers, allowing them to quickly iterate on code which could be deployed to production.

Looking at the common pitfalls of abstractions above, we were flirting dangerously with number two attempting to cover up all the complexities of distributed computing. We avoided this by narrowing the scope of what our framework abstraction provided. Without getting too in the weeds, researchers had full disposal of the python SDK on their laptops where as the new framework had a much smaller offering. But it was just enough for the researchers to still do their job plus allowed them to deploy to production, a very compelling offering.

The framework was a thoughtful and precise layer balancing the complexities. Had it taken on too much, it would promise more than it could produce. Too little, researchers would not have used it at all. This delicate balance needs to be struck by all abstractions.