2021.01 Vol.2

Bit of dependencies

Operator! Get me the president of the world!

Dependencies are simultaneously the greatest and scariest.

They allow developers to reuse code, a pillar of software engineering, and no doubt the reason software is created and changed so quickly.

Dependencies can be code developed by another company, like the omnipresent Google Guava library. Or they can be an internal library that developers publish for other teams.

But dependencies tend to have dependencies. And managing this graph of dependencies is impossible. We are talking NP-Complete satisfiability, that scary stuff from CS103 Algorithms.

The Diamond Problem

An example of this impossibility is the “Diamond Dependency Problem”. Library A depends on libraries B and C. B and C both depend on library D, but different versions of D. What version of library D should library A use?

A +---> B ---> Dv1
  |
  +---> C ---> Dv2

Programmers have implemented different strategies or heuristics to deal with the “Diamond Dependency” and none have solved it. Build tools like Gradle are able to recognize the problem. But after recognizing it, Gradle picks one of the versions (by default the newest) and tosses it on the classpath. No guarantee that version will work and failures often occur at runtime.

NPM takes a different approach. Each dependency has a copy of all its dependencies and these copies are not shared. Library B has its own library D and library C has its own version of D. This is kicking the can down the road. Now developers discover breaking changes at runtime in horrible fashion. For example, library A asks for a model object from library B and passes it to library C. What if library D defines this model? There are now two versions of library D’s model floating around which can lead to awful serialization problems.

Avoid Exposure

One extreme strategy to avoid dependency hell would be to treat each version of a dependency as a new dependency. The trade-off being massive overhead (updating all package imports) every time a dependency was upgraded. This is on the far end of the spectrum and probably not worth it.

The goal is to limit exposure to the set of problems caused by multiple versions of a dependency existing in a system.

If you control all applications which make up a system, a version of a dependency can be forced across the system. This is somewhere in the middle of the spectrum which has the wild west on one end, and each version as a new dependency on the other.

Another mitigation strategy is to limit the interface in which applications communicate. Instead of an application relying on a library directly, it can make an RPC through a limited scoped interface such as a Thrift model. The Thrift IDL is designed specifically to keep a schema simple even as things evolve. This would help avoid library defined model conflicts at runtime, but errors will still always be possible. The best hope is to limit them while taking advantage of dependencies.