2021.01 Vol.1

Bit of monorepo

A quick little balefire.

I was once part of a developer holy war.

The team could not decide how to organize our code. Should it live in one repository, a monorepo, or should there be a repository per project?

The war was ignited by a challenge we were facing: scaling developer productivity as the team grew.

Everyone agreed that code organization could help combat this producitivity loss, but which strategy should we take?

Much to the dismay of some, we ended up with the monorepo.

Productivity Breakdown

Developers face scaling challenges all the time and some of those are easy to predict. An application might work fine for ten users, but a we wouldn’t expect it to hold up to millions of users without some changes. Some challenges are not as obvious.

At one point in time, all the developers of Fitbit were in single room. I took for granted a lot of properties that come with a team that size. We merged code straight to master and resolved conflicts in person. Even if other developers were not working on code related to a change, they had a good gut instinct on the effects it would have. These instincts allowed us to detect breaking changes before they got to production.

However, as the team grew, errors began to happen at an exponential rate in development and production.

It’s tough to say at what team size the project began to degrade, 10 devs, 30 devs, or 100 devs. But changes that used to be easy began to require hours to coordinate and were error prone. The size of our team was taking a toll on productivity.

And that is when the monorepo versus multiple repo debate took off.

The Influence of Code Organization

Code organization has the potential to influence how easy or difficult it is for a developer to discover code, build code, and test code.

Discover: Where does this code live? Where is this code used?

Build: How do I build this project? How do I manage it’s dependencies?

Test: How do I test this code? How do I test the code that depends on this code? (this one is a biggy)

The assumption is developer productivity remains high, in the face of a growing team, if these tasks remain easy. So which code organization strategy keeps these the easiest?

Discover

Finding usages of code is marginally easier in a monorepo, since all code can be grep’d at once. But simple tools applied to the multirepo approach produce the same effect.

Relative to the other tasks, its a wash. Even though the developer holy war email thread spent a lot of time on this topic, its a lot less meaningful than the others. So let’s learn from the past and move on to the more interesting cases.

Build and Test

Buiding and testing code is where a monorepos shines because a monorepo can enable faster failures and avoid technical debt.

To enable faster failures, we must leverage a monorepo’s one inherit advantage: atomic commits. A developer can make a change affecting more than one project in one all-or-nothing (a.k.a. atomic) step. The multiple repository process to push out a change often follows the open source pattern. A developer patches a project and uploads it to a central location. At a later time, a dependent project pulls down the new version. There is a layer of indirection which forces the process to have multiple steps.

So to perform a library update with multiple repository code organization, waves of builds and tests have to be run. First, a developer patches the original library and publishes it to a central location. The downstream projects which depend on the library need to test the new version. A tool, or an unfortunate developer, needs to update the downstream projects and test them all. If there is a break, the original library patch needs to be rolled back.

And what about downstream projects of the downstream projects? The waves of building and testing continue. Each wave adds complexity and brittleness, especially in the face of rollbacks.

Using atomic commits in a monorepo, we avoid the waves of builds and tests. Instead of pulishing a new version of library and then coordinationg testing of affected projects, we do it in one step. The library and all affected projects are all tested on the revision containing the change. This allows dependent projects to fail fast on changes.

Avoiding Debt

If a developer is used to the open source, multiple repository model, this monorepo approach sounds like a lot of work. To update a library I have to update all dependent projects at the same time? Why me? The answer is you, because the developer best equiped to deal with a breaking change is the one making the change.

An Unnecessary Interface

At some level of scale it makes sense to break a monolith application into micorservices. Microservices accept that the complexity of the system increases (more than one live version of code, service discovery, load balancing) versus a monolith. But in this case, the complexity can be worth it.

Is there added complexity for multirepos? The trials of building and testing code exist, but there is also a social element. Conway’s Law states that the structure of projects that people build reflects the social structure of the people that build them. In software engineering, this often manifests itself as code interfaces between projects. And these interfaces are often where bugs occur.

Multiple repository code organization encourages another interface within a system, where as a monorepo discourages it. One less interface to cause problems.

Embrace the Monorepo

A monorepo has its faults and doesn’t solve everything. But in some scenarios (e.g. group of developers with highly shared interests working on tightly coupled libraries and services), it has the higher potential to maintain developer productivity.

P.S. Deploying Code

As soon as a project requires more than one machine to run on, it will have to deal with artifact versioning in production.

It sounds weird to have a monorepo with microservices, but code deployment and code organization are orthogonal strategies.