Derived References

// #Rust

Over the past year, I have run up against some pain points using async features in rust. I tripped over cancellation safety and channel-based event loops. I even tweaked some mutex usage in the spellchecker I use, Harper, so maybe I am not the only one? My theory is that it has something to do with lifetimes getting crazy with futures.

While the await syntax makes the code look synchronous, there is some heavy magic going on under the hood to make that happen. I think you just have to have a feel for it since it will bleed through. So, how does that magic interact with lifetimes?

Lifetimes usually follow lexical scopes, as in, you can see beginnings and ends by just looking at the curly braces {...}. You don’t have to run the code, it can be statically analyzed. It is so simple we can just visualize the lifetimes. I think this simplicity comes from how the stack implementation just fits the picture so well. The compiler can replicate what the stack will do at runtime in order to verify ownership and borrow lifetimes. Control flow is relatively constrained, things will execute in a linear-ish fashion and at some point a stack will “unwind” back to where data was initialized.

Not the case for async. The compiler turns async functions into a, potentially huge, state machine. And the state machine does not need to finish in one go. When the future is awaited upon with an await call, this could result in many implicit small pushes forward before finally resolving and moving on to the next instruction below the original await. Other code can run in between the implicit awaits. The is the power of async! But this dynamic, runtime element adds complexity to the lifetime analysis.

fn synchronous() {
    let x = String::from("hello");
    let reference = &x;
    println!("{}", reference);
}

async fn asynchronous() {
    let x = String::from("hello");
    let reference = &x;
    some_future.await;
    println!("{}", reference);
}

How much changes holding that reference to x across the await point?

Any code in between await points in a future runs just like normal. The trickiness comes when a shared or exclusive reference is held onto across an await point. The instruction after each await point is a spot the future can potentially hop right back to, with no stack context! So kind of like a closure, the future needs to hold on to the relevant context somewhere so that it can perform this miracle.

This is where I think things get weird. A reference that in synchronous code looks like it lives for a small 3 line function, all of a sudden is embedded in some huge state machine and could live for a really long time. But does the compiler just handle this all super well, can it still analyze the lifetimes? Should I not worry about this?

My mental model for a bunch of nested async function call is one huge future. I am pretty sure this is what the compiler actually does under the hood. Compiles each function separately into a future-state-machine-returning function and then wires them all together into one big future. A future chain, or the more common name task.

Take this other state machine, and run it to completion as part of my state machine.

enum MainFutureState {
    Start,
    WaitingOnLevel1 { level1_future: Level1Future },
    Done,
}

enum Level1FutureState {
    Start,
    WaitingOnLevel2 { level2_future: Level2Future },
    FinishingUp { level2_result: i32 },
    Done,
}

enum Level2FutureState {
    Start,
    WaitingOnLevel3 { level3_future: Level3Future },
    FinishingUp { level3_result: i32 },
    Done,
}

enum Level3FutureState {
    Start,
    Done,
}

Rough idea of the async state machines and how they are wired together.

What are some possible worst case scenarios? On the plus side, memory is free’d from completed futures (e.g. Level2Future when Level1Future is done with that step) and the compiler only hangs onto data in a future that it thinks is required across await points. So what are the worst types of data that cause the most pain? And does composing futures in certain ways cause extra pain? If I was a compiler validating lifetimes, what would I hate to see…

Something like a select! bock obviously adds a lot of complexity. Any one of the branches can be run and the compiler has no idea to tell which, or in what order, since that happens at runtime. It probably has to be conservative and just assume “all”. This probably leads to some confusing lifetime errors for a developer since it is unclear how two separate pieces of code are related.

The fact that tasks (the big chain’d futures) can be sent to different threads for execution places a Send requirement on future state. Again, not obvious for a developer, but the compiler probably has a clear warning for that at least.

Maybe the big weird is just the loss of the implicit hierarchy established by the stack.

fn example(data: &Vec<u32>) {
    let reference1 = &data;          // Level 1 reference.
    let reference2 = &reference1[0]; // Level 2 reference. 
    
    do_something();
    
    println!("{}", reference2);
}

In synchronous code, references deeper in the stack cannot outlive references higher in the stack.

Now what happens if we do some awaiting…

async fn example(data: &Vec<u32>) {
    let reference1 = &data;          // Level 1 reference.
    let reference2 = &reference1[0]; // Level 2 reference.
    
    do_something().await;            // Stack is dismantled here!
    
    println!("{}", reference2);      // Must reconstruct relationship.
}


// Simplified state machine capturing data for await points.
struct ExampleFuture<'a> {
    data: &'a Vec<u32>,                // Original reference
    reference1: Option<&'a Vec<u32>>,  // Level 1 reference
    reference2: Option<&'a u32>,       // Level 2 reference
    state: State,
}

More mental model-y than reality.

The implicit relationship between the references and the argument is lost. A lifetime must be established to tie them all together, although it still doesn’t quite capture the relationship between the three. Which makes it harder for the compiler, and it might have to reject some code it just cannot follow…and give a confusing error.

async fn reallocation_problem(data: &mut Vec<u32>) {
    let reference = &data[0];   // Reference to first element
    
    something().await;
    
    data.push(42);              // Might cause reallocation, invalidating previous references (assuming after await it doesn't know the references are related)
    println!("{}", reference);  // Is this still valid?
}

A big shared lifetime is not enough info for a borrow checker.

Lifetimes are granular in synchronous code, but coarse in async. My guess is that the synchronous model starts with the relatively simple to analyze stack, and then can be further optimized from there. The async model is something closer to a directed graph with futures as the nodes and the edges as data dependencies. The compiler needs to check that parent nodes outlive all their descendants if there is a data dependency. We go from nested stack frames to nested state machines. But futures can make the state machine analysis tricky. Futures can be created and consumed in different parts of the code, a mismatch between the code’s lexical structure and its runtime behavior. Remember the sync analysis is nice because the lexical structure aligns with the runtime behavior of the call stack. Another complexity is introduced by future composition, like join! or select!, which creates tricky borrow scenarios. Another mismatch between what the code looks like and how it actually runs.

In sync code, the compiler can just check lexical order. “Did x come before y? Cool, we are done here.” Since order is no longer king in async-land, the compiler needs to do more work and wire up types between futures to ensure lifetimes checkout for all possible flows. A conservative approach must be taken for practical static analysis. “Ok x is passed to this future, but that future isn’t consumed till down here. And ah, it is also running at the same time as these three other futures which could run in any order. They also have a reference to x, gotta check their usage now…”

I am getting the sneaking suspicion this might explain why Arc is tossed around so liberally in async contexts. It is Send‘able and breaks the lifetime constraint worries, pushing that burden on the runtime. I am not sure I love that pattern, it wasn’t immediately clear to me why ownership was required in these scenarios. But that’s because it’s not, I think it is just used to cut trough lifetime knots. While lifetimes can be a pain, the borrow checker is a zero-cost abstraction with all the pain happening at compile time. It would definitely be nice if it was as easy to take advantage of it in async-land too, instead of bailing out to runtime alternatives.

https://github.com/Automattic/harper/pull/612 • https://youtu.be/9_3krAQtD2k?si=z7aQjxHMnMcqlMIn • https://cliffle.com/blog/async-inversion/