// Giving meaning to these bits
Contents
Meaning
- Syntax – Structure or grammar of a language.
- Semantics – Meaning of a program.
The man bought the infinity from the store.
Syntax is correct, but semantically meaningless.
Concurrency
What work should the program run next? Who is driving? Concurrency is a bit of a crossroads in programming. I have struggled with why it’s so difficult to even wrap my head around the challenges.
First off, concurrency versus parallelism. Concurrency is about dealing with multiple things efficiently. Parallelism is doing multiple things at once with a focus on performance. So parallelism is more about “turning it up to 11”, throwing more resources as a problem. Concurrency is about breaking a problem up so that a program isn’t ever sitting around doing nothing, waiting for things outside its control (like I/O). Lots of times this “break up” also enables parallelism if desired.
Even with a focus on just concurrency, there are a few complexities that are hard to tease apart.
- Concurrency syntax of a language is facing an uphill semantic battle. For better or for worse, humans usually “think” things execute in serial, top to bottom. Text is naturally a serial, 1-dimensional representation, whereas the world has more possibilities!
- Languages come bundled with a runtime to help choose what is next to run, but these implementations generally “bleed” through to the programmer. Kinda like how bad garbage collectors force a programmer to be more aware of memory management.
- This is all made even more complex by many layers of scheduling from the language runtime down to the kernel.
there is a thread
A thread is the standard concurrency abstraction exposed by an Operating System (OS), more specifically the kernel program of the OS. Threads give the impression to the caller that everything is happening in a “thread” of serial tasks. Threads are naturally implemented with function calls on a stack.
Concurrency runtime schedulers come in two flavors:
- preemptive – Context switches are dictated by runtime.
- cooperative – Context switches occur only when processes voluntarily yield control.
Operating systems are designed to be general purpose, but have any ever exposed a different scheduling runtime? I think because they are so general, they need to be preemptive. No one would trust a cooperative OS.
Languages usually bundle their own runtimes which layers on top of the preemptive thread abstraction exposed by the kernel. Language runtimes is where there is a lot more flexibility and many examples of preemptive and cooperative interfaces for programmers. A language runtime could choose to keep things super simple and just expose the kernel threads as a 1:1 concurrency primitive. This essentially delegates all the hard parts to the kernel. A language could also go the opposite route to squeeze out efficiency and create a new abstraction layer across the kernel threads.
there is no thread
If you spend all your time in “high level” languages, e.g. Golang, you might assume that there are threads all the way down to the hardware CPU. But it is a different world down there. A program is generally trying to solve one goal, whereas the kernel is trying to balance multiple programs, with different requirements, across limited resources (CPU cycles, memory). The kernel is also much closer to the bare-metal and has to deal with more real-world physics issues which are generally abstracted away at the higher levels.
The CPU really can only do one instruction at a time (per core). A kernel preemptively weaves together operations from different threads to try and distribute resources fairly with TRAP
s and SRET
s (returns).
The kernel also deals with the other hardware devices, including those which perform I/O like network cards. If a CPU issues an operation that performs a network requests, it is probably going to take a ton of cycles to complete (think TRON time). The CPU could just sit there and wait (tread style?), but that is super inefficient. It could instead insert some polling operations into its queue to check if the request is done, and free itself to do other work in the meantime. Much better, but still wasted cycles on polling. One other option is to just keep doing other work until the device is done and then get “interrupted” to perform the next task with the data. This last option is the most efficient, but requires some agree’d upon interface between the I/O devices and the CPU.
These interfaces are much “harder” than software interfaces in higher level languages. They involve actual “lines”, like the interrupt request line, to transmit signals as well as hardcoded memory locations which contain handler tables (e.g. when this happens, do this) like the interrupt descriptor table. Often, these tables are populated at boot-time with close cooperation between the CPU and the kernel.
asynchronous
A language or a library defines an asynchronous interface for how it abstracts concurrent operations. While some languages just expose kernel threads to solve this issue, that implementation isn’t considered “asynchronous” (more multithreaded-ish). An asynchronous interface is attempting to get meta information from the programmer as to how to break-up a program to be run concurrently by any runtime that satisfies the interface requirements.
Kernel’s expose interfaces which make it easier to implement the asynchronous runtimes, like the epoll
syscalls.
The async
and await
syntax has become popular in many languages. What it actually means is language dependent since it is a bridge between the programmer, the compiler, and usually a runtime. Rust is special where it doesn’t supply a default runtime. But async
and await
are ways for the programmer to tell the compiler, “hey, you can break-up this chunk of code into a future with these yield points”.
A future is a reification of “something which will be completed in the future”. Work can be completed in all sorts of ways, but this concrete thing represents the state of that work. It is a nice meeting ground for programmers to have granular control flow of work being done. This are often implemented as state machines, where each state maps to a yield point.
Futures must be passed to a runtime to “drive” them to completion, otherwise they are just state objects sittin there. The compiler and the runtime are usually coupled because design choices need to be made like how much state is stored in the future? How much of the environment is captured when it is await’d upon?
A future can be composed with other futures into one big future. This is what happens to a program compiled with async/await syntax. One can imagine the program is a big state machine, or maybe a tree of possible states. The root future probably has some sub-futures (async function calls). At some point, a future has to block and actually wait on some work, the biggest example of these are any I/O libraries, maybe using epoll
. These are “leaf” futures. Programmers are generally orchestrating the middle layer futures, not re-writing leaf futures.
The key to developing with async/await is keeping the stackless trade-offs in mind. Whether explicit or implicit, some runtime will have to drive the future completion. The future’s state transitions are comparable to function calls on the call stack, but they are more of a “rolled up” form. Where function calls show the exact path a program arrived at a state, a future’s state has the bare-minimum information required to pick up at a spot. It doesn’t show how it got there and there could be many possible paths. There is no longer the built-in debugging tool of a stacktrace. The one-dimensional nature of written code no longer lines up with a deterministic order of execution.
stackless vs. stackful
There are so many different types of runtimes and implementations, there is no perfect divide of characteristics and trade-offs. But stackless vs. stackful captures a lot of the challenges. In the stackful abstraction, each concurrent task has its own stack. The task can be suspended at any point in the program since the whole state is preserved. Stackless tasks share a call stack, so can’t be generally suspended. Instead, they have pre-defined suspension points which persist just the required amount of data for the next suspension point.
Some areas of trade-offs: memory usage, suspension performance, developer productivity, compiler optimizations. There are a lot here, and optimizing for one usually negatively impacts some of the others, as is often the case with interesting challenges. There is not a perfect balance since different scenarios will favor different trade-offs.
Food for thought: would you consider the kernel concurrency model stackless or stackful? On one hand, it is exposing the thread abstraction which is entirely stackful for the caller. Every task (a.k.a. thread in this case) has it’s own stack. But under the hood, context switches involve swapping out a limited set of registers which feels a little more stackless…
sans-I/O
Writing code which can be used with synchronous and asynchronous runtimes can be a bit tricky. One part of the challenge is the “function color” problem, asynchronous functions taking advantage of the compiler to do some work with async/await
require async functions to be called by async functions (or a runtime driver). This makes it hard to hide a stream of data behind a common abstraction for synchronous and asynchronous pulls of data.
Network protocols are a specific domain where there is usually a close coupling of the transport layer and the communication protocol for efficiency. This coupling makes it really hard to write transport (e.g. async vs. sync) agnostic code. Some patterns, sometimes referred to as “Sans I/O”, maybe can help out here though.
It makes sense that the library would want to abstract the I/O part and just call some general read()
function and then focus on its biz logic. If we can’t settle on an abstraction layer at that point though, can we pull the abstraction layer down and each I/O gets its own implementation layer? So there might be the core “lib” library, and then a handful of I/O libraries like “lib-async”, “lib-sync”, etc.
Protocols though generally want to pull very specific amounts of bytes off the wire and deserialize them into some relevant packet. The logic wants to “drive” the I/O. There is gonna be some challenges to separate these.
The simplest idea (thinking in Rust terms) is a function to take a reference to read from, &[u8]
, and mutable references to read from &mut [u8]
. Usually the protocol code drives the I/O, but what if that gets inverted and the calling code drives the protocol? The calling code will be responsible for pulling from the I/O data stream and then advancing the protocol state machine. There might get some tricky-ness here though, what if it feeds the protocol too many bytes for a certain step?
The protocol implementation no longer drives its own I/O but instead has data passed to and from it using in-memory buffers of bytes.
Buffering incomplete data in the protocol state machine is now the big challenge.
syscalls
When to use buffers? Don’t wanna make a bunch of trips to the grocery store only picking up one item at a time. “Buffer” syscalls.
Types
Typing is semantics. The programmer gives the compiler more information to give it a chance to check if the programmer is doing what they think they are doing.
Bits are just symbols. The bits need a type to give it meaning and how to convert them to other symbols (bits).
structural vs. nominal
- Structural – Implicit relationship.
- Nominal – Explicit relationship.
Generally, I prefer explicit over implicit language features. Golang’s structural typing still allows for clean separations of concerns despite its implicit nature. The “accept interfaces and produce structs” mantra is even based on it.
In structural-land, you can declare an interface which just so happens to be implemented by you dependency. It makes it super easy to mock it out in unit tests. In nominal-land, you have to declare an interface and then create a new type which implements it and delegates to the dependency. I guess just more boilerplate when it comes to unit tests.
variance
Add hierarchy to type structure.
- Invariant – Behavior means that only that exact type is acceptable. Many functional languages lack subtypes, and so are generally invariant.
- Covariant – Behavior means subtypes are acceptable in lieu of the specified type. Many object-oriented languages have a default preference for covariance.
- Contravariant – Behavior means supertypes are acceptable substitutions in this type. Anytime when covariance is possible, contravariance also becomes possible.