Breaking down the thread abstraction
And I would play with fire to break the ice.
It was time to start breaking up the Monolith into micro-services. The Monolith was a massive application which contained all our project’s code and it was almost impossible to maintain. As we considered our options for our next generation tech stack, I heard the word “asynchronous” a lot.
We chose a micro-service framework which exposed an asynchronous API. This was a radical change from our old tech where each request had it’s own thread. Following the hip new trend, I thought our database driver should also use an asynchronous API.
There were performance and scaling benefits with the new tech, but there was also a large increase in bugs. What happened?
I needed to dive deep under the abstraction layers to understand what was going on. What is “asynchronous”? Is it worth it?
There Is No Thread
At the lowest level of computers everything is asynchronous. This was somewhat of a surprise after years of writing synchronous code. It probably has something to do with the world being asynchronous though.
The path from high level language code to bits on the wire is complex. But just knowing the gist is helpful.
Some synchronous code makes a blocking I/O call. The language runtime library translate that into some kernel commands. The kernel instructs a hardware device, through a device driver, to perform the actual I/O. At this point the kernel moves on and the device is busy sending signals.
When the device finishes its I/O it interrupts the kernel. The kernel makes a note to pass that message back up to user land. The language runtime is waiting for that signal so the synchronous code can continue its “thread” of execution.
So down at the lower levels there are no threads. Threads are a higher level abstraction that developers have been working on for the past fifty years.
Back in the 70’s the creation of this abstraction was a big deal. A debate raged between the computer science heavy weights on whether we should restrict our code to make it easier to reason about. This is when developers started to frown on GOTO jumps despite their power.
GOTO leads to spaghetti code which is hard to reason about. So developers created and adopted some structures to keep things simple. These exists in all major languages today. Things like control flow (if/then/else), code blocks, and subroutines (functions and call stacks).
This also solidified the causality of code. If you see the following:
g();f(); you would assume that the function g is run before f. Programmers take this concept for granted these days.
Programmers have spent a lot of time and effort building up this “thread” concept. But these new fancy asynchronous APIs with their callbacks look an awful lot like a GOTO.
Performance and Scalability
Asynchronous implementations get sold on their performance and scalability. How much performance and scalability though, depends on the use case.
Let’s take the case of a monolith application broken up into micro-services. A service gains performance if it can query other services in parallel. And a service is more scalable if it can service hundreds of I/O requests on one thread since it takes less memory.
For the database driver, developers did not often query the database in parallel. But a service did perform hundreds of parallel I/O requests to the database. In our old tech stack, each request had its own thread. These threads would block on database I/O threads, wasting the memory resources they consumed.
The nature of this application meant it spent most of its time waiting on I/O. It would also spend some CPU marshaling data around, but was by no means CPU bound. This is a case where it makes sense to have one thread manage all this waiting and light CPU work.
So we use our limited resources more efficiently. But at what cost?
What Have We Lost
Our code is now full of callbacks. Callbacks shatter structured programming. Exception handling no longer works. Try with resources no longer works. And what’s worse is that these fail silently. The compiler isn’t going to tell you that the code you wrote won’t actually catch any exceptions.
We have also lost the free back pressure. If ten threads are running synchronous code and the database hiccups, all ten threads will pause. They will no longer accept new work and this back pressure propagates upstream. Asynchronous code keeps accepting new work even though none is getting done.
Arguably the worse loss however is causality. While some asynchronous frameworks guarantee all code is ran in one thread, removing a large set of concurrency bugs, it is not obvious in what order this code will be ran. There are many different possible logical threads of execution.
g();f(); no longer means what a developer thinks it does.
A Leaky Abstraction
Developers struggled with the loss of causality when migrating to the async API of the database driver. When was code being executed? And from where?
The callbacks were exposed as Futures (fancy callbacks) which readily accepted more GOTOs to be tacked on (in the form of functions). What wasn’t obvious was that these GOTOs would be run by the underlying event loop implementation. Slowing down the event loop thread caused hard to debug problems.
This burden of making sure code was being ran in the correct spot was new. And as code gets more complex, keeping track of these logical threads of execution becomes more difficult.
The thread has become a leaky abstraction.
An Old Hope
So I don’t like asynchronous interfaces, but its undeniable that there are cases where operating system threads are not the best concurrency model.
Maybe coroutines are the best of both worlds.
Each of these “threads of execution” have their own stack frame. They have all the same characteristics as normal threads, but have the potential to be ran concurrently.
This isn’t free though. The Rust language actually used to have coroutines as first class citizens, but deprecated them. This is because not only must the compiler have the ability to turn functions into state machines, but a runtime is needed to schedule these coroutines. Rust being a low level systems language didn’t want the burden of this runtime scheduler.
A language like Go doesn’t mind it though. Maybe the future is here.