Send Bounds

// #Rust

I bumped up the MSRV on one of the crates I am building in hopes to make use of the relatively new async function traits for testing. And then ran smack into one of the known limitations of the new feature, “Send Bounds”. Kinda funny, but this is what was said about my issue when they were experimenting with it.

One hypothesis is that while people will hit this problem, they will encounter it relatively infrequently, because most of the time spawn won’t be called in code that’s generic over a trait with async functions.

Send and Sync

Rust has two traits which bring safe concurrency to its type system, Send and Sync. The goal is for the compiler to verify thread safety instead of deferring to runtime or worse, just winging it.

Send and Sync are both marker traits. They don’t have any methods, they just mark that a type has some sort of property. They are also both auto traits, which means the compiler will automatically implement the trait for a developer if all the member types have the trait.

So what are the differences? Send says “you can give ownership of this value to a different thread”. Most things are Send, but something like Rc (which depends on thread mutual exclusion) or thread-local things are obviously often tied to a specific thread.

Sync is defined as “a type T is Sync if and only if &T is Send”. Is it weird for a reference to be Send? Not really, it’s just a read address right. Sync feels like a lighter version of Send. But there are also types which are Send, but not Sync (aka Send + !Sync). A Cell is the common example, which depends on no references to its value.

Since these are auto traits, I hadn’t really had to deal with them much. But futures require them to be much more visible.

Futures

The golden rule to keep in mind when using the nice async/await syntax is that those simple looking async fn’s are actually state machine structs. And those structs play by all the same rules as any other normal stuct you define. But the future-based structs get weird for data used across await points. Its a caveat you see all the time in future documentation. If some data is used between await points, it plays by the exact same rules as any other. It is easy to reason about. But if the date is used across an await point, it needs to now be a member of the future-struct so that it is saved for the next execution run.

Back to Send, it is an auto-trait so it gets applied to a type if that type’s fields are Send. Well, a future-struct field’s are the data held across await points. So a future is Send if its data is Send. And why is this so important for futures? The most popular runtime, tokio, uses a work-stealing executor. It is possible for a task to run on one OS thread, await, then run on another. But to do that, the future struct must be Send!

A future struct’s members also includes the futures it awaits on. So if just one of those is not Send, the future itself is not Send. And that bubbles all the way up to the top of the future task tree.

For a tokio application using the work-stealing, multi-threaded runtime, the work-stealing happens at the task level. So do all future have to then be Send in order for this to work? Well, there is one more twist.

#[tokio::main]
async fn main() {
    my_work().await;
}

fn main() {
    let rt = tokio::runtime::Runtime::new().unwrap();
    rt.block_on(async {
        my_work().await;
    });
}

The main task uses the runtime’s block_on function, so the thread will not give up this task.

So the task on the main thread does not need to be Send. But still, should the Future trait just require Send? This kind of goes against the ethos of rust, to require some functionality which may not be used. A single-threaded runtime could squeeze out more performance from non Send futures, requiring Send would needlessly slow it down.

Traits

Another reason why this might not be such a big deal is that the compiler is able to tell if a future is Send if it is a concrete type. But, as described well in this issue, what if a function is generic over a trait, spawns a task on a work-stealing executor, and calls the async function of that trait? The compiler might not be able to figure out if the concrete type is Send. Compilation happens at crate boundries, so implementations of a trait may live outside of a crate and not be known to a compiler. But a compiler has to make a call without that knowledge, “is this code safe?”. It must assume the worst. Theoretically the compiler could make assumptions if a trait was private or pub(crate), but I don’t believe this is possible as of today.

The initial support for async functions in traits released with rust 1.75.0 does not directly address the Send complexity. That is to say, the async functions de-sugar into future returning functions which don’t require Send. To do so would have broken the “too much functionality” principle. But, another very related, and much more verbose feature was released in 1.75.0, Return Position Impl Trait in Trait (RPITIT).

// When you write this (stabilized in 1.75):
trait Database {
    async fn query(&self) -> Vec<Row>;
}

// It desugars to this (which requires RPITIT, also stabilized in 1.75):
trait Database {
    fn query(&self) -> impl Future<Output = Vec<Row>> + '_;
    //                ^^^^ return position impl trait in trait
}

// Could have desugared to verbose associative type:
trait Database {
    type QueryFuture<'a>: Future<Output = Vec<Row>> + 'a 
    where 
        Self: 'a;
    
    fn query(&self, sql: &str) -> Self::QueryFuture<'_>;
}

De-sugaring the async syntax.

RPITIT allows a function to be written with an impl generic return type. I think it might still be then implemented under the hood with a verbose associative type. But what is important here is that a developer can skip over the super nice async fn syntax and write this themselves. This is where we can add a Send trait bound.

trait Database {
    fn query(&self) -> impl Future<Output = Vec<Row>> + Send + '_;
}

Add the Send bound by hand, limits implementations, but increases usage.

By the way, are you wondering about that anonymous lifetime '_?

// What you write:
fn query(&self) -> impl Future<Output = Vec<Row>> + Send + '_;

// What the compiler infers:
fn query<'a>(&'a self) -> impl Future<Output = Vec<Row>> + Send + 'a;
//       ^^   ^^                                                   ^^
//       |    |                                                    |
//       |    self borrows for 'a                                  |
//       explicit lifetime parameter              future lives for 'a

The future might hold references to data from &self, so it can’t outlive self.

Return Type Notation

I can deal with this for now, although it is kind of a bummer that there is this tradeoff. But it appears a lot of research is being done to iron it out. One which looks promising is Return Type Notation, where the bound can be defined at the call site instead of the trait. This keeps the trait flexible, while maintaining the guarantees.

https://smallcultfollowing.com/babysteps/series/send-bound-problem/ • https://theincredibleholk.org/blog/2023/02/13/inferred-async-send-bounds/ • https://github.com/rust-lang/rust/issues/103854 • https://blog.rust-lang.org/inside-rust/2023/05/03/stabilizing-async-fn-in-trait/ • https://www.youtube.com/watch?v=yOezcP-XaIw • https://github.com/rust-lang/rfcs/pull/3425 • https://github.com/rust-lang/rust/issues/91611 • https://rust-lang.github.io/rfcs/3185-static-async-fn-in-trait.html