Async I/O

BIP-324 Series

There is a currently a super annoying fracture in the rust async ecosystem. There is no standard library AsyncRead or AsyncWrite traits. These would be the equivalent of the blocking I/O Read and Write versions. The Read and Write traits are part of an elite group of powerful abstractions included in the standard library. What makes them so good? They have a small core, but rich extensions. With just a handful of functions implemented, the interfaces cut across files, sockets, memory buffers, processes. And they can compose their core functions into super powerful ones like read_exact or write_all. The abstraction has clear semantics no matter the implementation. This makes them very useful in library code where the caller can decide what I/O they want to use, and the library doesn’t care at all.

And perhaps the most powerful characteristic about the domain is how well readers and writers compose together into one big version. Let’s take a look at the dead simple, blocking version io::Read.

pub trait Read {
    /// Pull some bytes from this source into the specified
    /// buffer, returning how many bytes were read.
    fn read(&mut self, buf: &mut [u8]) -> io::Result<usize>;
}

The only required method to implement on the io::Read trait.

The read method takes a mutable slice to write bytes into from its source, returning the number of bytes written. The &mut self implies that the function needs mutable access to the source, either owning it or some reference to pull from. So when composing readers, a reader has an inner reader it is pulling from. A reader can be composed with a buffer for performance, a decompressor, a decryptor, and a decoder. Which all roll up to a reader with the same interface, a buffer of bytes.

pub trait Write {
    /// Attempt to write some data into the object,
    /// returning how many bytes were successfully written.
    fn write(&mut self, buf: &[u8]) -> Result<usize>;
    /// Useful for adapters and explicit buffers
    /// themselves for ensuring that all buffered data
    /// has been pushed out to the ‘true sink’.
    fn flush(&mut self) -> Result<()>;
}

The write half in io::Write is slightly more complex, but same principles apply.

Most abstractions generally require more domain specific context to be useful, so they don’t get to be in the standard library. But being in the standard library is awesome, it’s the one library we all have access too, one less thing to coordinate on. So, why can’t some async versions be settled upon?

There are some heavy fundamental difference between blocking and non-blocking runtimes. For blocking, a lot of the complexities are pushed down into the OS layer. Those complexities are pulled up in a non-blocking runtime to the application layer, which gives the developer more fine grained control. The non-blocking runtime actually lives in the application. And kinda confusingly, it sandwiches the application code. The runtime at the top and the futures at the bottom, with the futures handling the I/O. The runtime and the futures are implicitly linked, but any sort of AsyncRead/AsyncWrite interface will influence this relationship. There is usually some tension in the method signatures between an ergonomic, flexible interface and raw performance. I don’t think it is obvious what interface strikes a good balance, so the standard library hasn’t adopted one yet.

There are two popular interfaces out there. The ergonomic, flexible, runtime-agnostic ones in futures-rs which is part of the rust-lang community. And the performant, runtime-specific ones in tokio. What makes this fracture difficult is that tokio is by far the most popular runtime. As a library developer, do you tie yourself to the runtime or try to remain agnostic? Are either of these heading towards the standard library?

pub trait AsyncRead {
    // Required method
    fn poll_read(
        self: Pin<&mut Self>,
        cx: &mut Context<'_>,
        buf: &mut [u8],
    ) -> Poll<Result<usize, Error>>;
}

AsyncRead in futures-rs.

The futures-rs version takes a more conservative approach and looks a bit like the blocking version. The new bits are providing the boilerplate to hook a future into an async runtime. The Future trait is just fn poll(self: Pin<&mut Self>, cx: &mut Context<'_>) -> Poll<Self::Output>; which can see is pretty similar. With the introduction of async functions in traits in rust 1.75, this could be written as the much more read-able async fn read(&mut self, buf: &mut [u8]) -> io::Result<usize>; and let the compiler transform it into this more boilerplate version. But similar to the blocking version, this is the only function which needs to be implemented to unlock a bunch of powerful functions in AsyncReadExt.

pub trait AsyncRead {
    fn poll_read(
        self: Pin<&mut Self>,
        cx: &mut Context<'_>,
        buf: &mut ReadBuf<'_>,
    ) -> Poll<io::Result<()>>;
}

Tokio’s own AsyncRead trait.

Obnoxiously, the popular tokio runtime tweaked its versions of AsyncRead/AsyncWrite creating the split in the ecosystem, but they might have good reason. This comment is one of the more concise descriptions of why introducing that ReadBuf type is worth it. And ReadBuf is interesting. It handles cases like the “uninitialized memory handling” issue in rust, which is something to explore with the bip324 library.

Despite this lower level change in tokio, in a lot of cases the code to use the higher level AsyncReadExt functions is the exact same as a futures-rs version. The only difference it the import path for the trait. The ReadBuf type is generally handled internally and not exposed at the top level. I think if you want the lower-level performance gains with a high-level interface, you can use the read_buf function in AsyncReadExt with the bytes crate for a buffer. But it is a lot of tokio-specific things.

So what is a library author to do? Maybe the introduction of async functions for traits in rust 1.75 allows for a third option. Instead of forcing the consumer to use your async I/O interface of choice, create a domain specific interface which they can implement. It can also be implemented for the standard runtimes (e.g. tokio) by the library so that the most popular option is good to go. But then there is a loss of composability.

I’m gonna analyze a the chacha20-poly1305 and bip324 crates to see if it helps with any clarity.

chacha20-poly1305

impl ChaCha20Poly1305 {
    /// Make a new instance of a ChaCha20Poly1305 AEAD.
    pub const fn new(key: Key, nonce: Nonce) -> Self

    /// Encrypt content in place and return the Poly1305 16-byte authentication tag.
    ///
    /// # Parameters
    ///
    /// - `content` - Plaintext to be encrypted in place.
    /// - `aad`     - Optional metadata covered by the authentication tag.
    ///
    /// # Returns
    ///
    /// The 16-byte authentication tag.
    pub fn encrypt(
        self,
        content: &mut [u8],
        aad: Option<&[u8]>
    ) -> [u8; 16]

    /// Decrypt the ciphertext in place if authentication tag is correct.
    ///
    /// # Parameters
    ///
    /// - `content` - Ciphertext to be decrypted in place.
    /// - `tag`     - 16-byte authentication tag.
    /// - `aad`     - Optional metadata covered by the authentication tag.
    pub fn decrypt(
        self,
        content: &mut [u8],
        tag: [u8; 16],
        aad: Option<&[u8]>,
    ) -> Result<(), Error>

The exposed interface.

The chacha20-poly1305 crate doesn’t currently use any Read/Write or their async equivalent traits. It has a minimal interface which encrypts and decrypts in place. If we focus just on the read path for now, pulling bytes off a source and decrypting them, some sort of ChaCha20Poly1305Reader might make sense. It would have an inner reader source which it could pass the buffer it was called with and then perform the decrypt call on the buffer before returning. The authentication tag and context is the tricky part though, that doesn’t fit into a reader’s basic fn read(&mut self, buf: &mut [u8]) -> io::Result<usize>; function signature. How would the read call know which bytes it is pulling off are even the tag? The packet-like structure (tags covering a whole message) is at odds with the stream-like (arbitrary chunks of data on demand) interface requirements. I don’t think there is much to gain here.

bip324

Perhaps the higher level bip324 library is more interesting, there is quite a bit more going on here. There is the lower level PacketReader/PacketWriter types as well as the higher level AsyncProtocolReader/AsyncProtocolWriter.

impl PacketReader {
    /// Decode the length, in bytes, of the rest of the inbound packet.
    ///
    /// Note that this does not decode to the length of contents described
    /// in BIP324, and is meant to represent the rest of the inbound packet
    /// which includes the header byte and the 16-byte authentication tag.
    ///
    /// # Arguments
    ///
    /// * `len_bytes` - The first three bytes of the ciphertext.
    ///
    /// # Returns
    ///
    /// The length of the rest of the packet.
    pub fn decypt_len(&mut self, len_bytes: [u8; 3]) -> usize {
       ...
    }

    /// Decrypt the packet header byte and contents.
    ///
    /// # Arguments
    ///
    /// * `ciphertext` - The packet from the peer excluding the first 3 length bytes. It should contain
    ///   the header, contents, and authentication tag.
    /// * `contents` - Mutable buffer to write plaintext. Note that the first byte is the header byte
    ///   containing protocol flags.
    /// * `aad` - Optional associated authenticated data.
    ///
    /// # Returns
    ///
    /// A `Result` containing:
    ///   * `Ok(PacketType)`: A flag indicating if the decoded packet is a decoy or not.
    ///   * `Err(Error)`: An error that occurred during decryption.
    ///
    /// # Errors
    ///
    /// * `CiphertextTooSmall` - Ciphertext argument does not contain a whole packet.
    /// * `BufferTooSmall `    - Contents buffer argument is not large enough for plaintext.
    /// * Decryption errors for any failures such as a tag mismatch.
    pub fn decrypt_payload_no_alloc(
        &mut self,
        ciphertext: &[u8],
        contents: &mut [u8],
        aad: Option<&[u8]>,
    ) -> Result<(), Error> {
        ...
    }

    /// Decrypt the packet header byte and contents.
    ///
    /// # Arguments
    ///
    /// * `ciphertext` - The packet from the peer excluding the first 3 length bytes. It should contain
    ///   the header, contents, and authentication tag.
    /// * `aad` - Optional associated authenticated data.
    ///
    /// # Returns
    ///
    /// A `Result` containing:
    ///   * `Ok(Payload)`: The plaintext header and contents.
    ///   * `Err(Error)`: An error that occurred during decryption.
    ///
    /// # Errors
    ///
    /// * `CiphertextTooSmall` - Ciphertext argument does not contain a whole packet.
    #[cfg(feature = "alloc")]
    pub fn decrypt_payload(
        &mut self,
        ciphertext: &[u8],
        aad: Option<&[u8]>,
    ) -> Result<Payload, Error> {
        let mut payload = vec![0u8; ciphertext.len() - NUM_TAG_BYTES];
        self.decrypt_payload_no_alloc(ciphertext, &mut payload, aad)?;
        Ok(Payload::new(payload))
    }
}

The lower level packet reader.

The PacketReader interface is obviously influenced by chacha20-poly1305 AEAD it is using under the hood. I left in the small implementation of the decrypt_payload function since it is showing off the “uninitialized memory” performance issue, where the payload buffer needs to be zero’d out even though it is promptly written into before ever being read. Maybe that can be iron’d out at a higher level.

The decrypt_payload_no_alloc function takes the ciphertext, splits off the tag at the end (assumes it is the last 16 bytes), copies the encrypted contents into the mutable buffer and then decrypts it in place. It might make sense to avoid the copy in this function, but that would require extending with the smarts to deserialize the decrypted data. It seems weird for the given buffer to have decrypted data, but the tags still sitting there, but maybe worth it for the memory usage? In any case, probably lot a good spot for Read/Write implementations given the packet nature again. Let’s take a look at the higher level interface to see if it sheds some light.

impl AsyncProtocolReader {
    /// Decrypt contents of received packet from buffer.
    ///
    /// This function is cancellation safe.
    ///
    /// # Arguments
    ///
    /// * `buffer` - Asynchronous I/O buffer to pull bytes from.
    ///
    /// # Returns
    ///
    /// A `Result` containing:
    ///   * `Ok(Payload)`: A decrypted payload.
    ///   * `Err(ProtocolError)`: An error that occurred during the read or decryption.
    pub async fn read_and_decrypt<R>(&mut self, buffer: &mut R) -> Result<Payload, ProtocolError>
    where
        R: AsyncRead + Unpin + Send
}

The interface of the high level async reader.

Ok, so that is starting to look more like it. It is taking a mutable reference to a reader, which makes sense, a caller would want the flexibility to plug many different network sources into this (e.g. tcp or websocket) and just have it decrypt it. It is a bit a misnomer that this is currently called a Reader since it is not actually implementing AsyncRead itself. But should it? It would require a new struct to own or reference the source. The Payload type is a simple wrapper around some own’d bytes and a helper function if they are a dummy packet. The new struct could just use it to decide if it should write bytes back to the given buffer. I might have to take some care to ensure it is still cancellation safe though.

The write path is a little tricky, it wouldn’t be able to expose some sort of “is dummy” flag on every write very easily. But it could maybe take some policy when constructed to write dummy packets automatically.i

In both these cases though, we still have the issue that the protocol is packet based. A Read interface, whether blocking or async, still exposes a stream of bytes. It might just not be a good fit.

alloc

The bip324 library disables the standard library by default with #![no_std]. It then conditionally enables memory allocation capabilities with a feature flag which flips on extern crate alloc. That essentially just pulls in what ever allocator is set by the caller.

Maybe the PacketReader/PacketWriter level shouldn’t even bother with alloc? That would give the caller the chance to re-use buffers if they so choose, performing deserialization in place. However, while the BIP-324 protocol is tied to bitcoin, it technically can be used to send any packets, not just bitcoin p2p ones. It might be too large an assumption at this low level to bake in deserialization to in memory models.

I/O dependencies

It feels like it doesn’t make too much sense for either library to implement any Read or Write traits themselves. But taking a broad I/O interface to pull and push bytes too is just too big a win to miss out on. Having the bip324 library work across any transport out of the box is huge. And that can be said for any networking related library.

So, which traits should you use?

Go all in on tokio, squeeze out the performance, but likely get tied to tokio tooling like the bytes crate. Probably slowly unknowingly adopt more tokio tooling, less flexible.
Stay agnostic and only use futures-rs interfaces. Put the burden on a tokio using caller to translate implementations from tokio versions to a futures-rs versions with a library like tokio_util compat. Adds a little runtime cost.
Only use the set of methods shared between the two’s AsyncReadExt/AsyncWriteExt’s and feature flag swap the imports. Performant, but obviously a limited functionality interface.
Define your own I/O interface (possible with rust 1.75+), implement it in futures or tokio. Also allows for others to implement it, but not sure how valuable at the low level I/O domain.

Option #3 is what I am currently rolling with in bip324. I think it makes sense to default to the futures-rs imports and treat tokio like an extension. It does add a little bloat to the crate though for tokio callers who don’t care about futures-rs. The other option would be to only enable on or the other, but need to then be careful for the --no-default-features case where neither are enabled.

https://github.com/rust-lang/futures-rs • https://github.com/tokio-rs/tokio • https://github.com/rust-lang/futures-rs/issues/2105 • https://github.com/rust-lang/wg-async/issues/23 • https://github.com/nrc/portable-interoperable/issues/5 • https://github.com/tokio-rs/tokio/pull/1744 • https://rust-lang.zulipchat.com/#narrow/channel/187312-wg-async • https://blog.sunfishcode.online/writingintouninitializedbuffersinrust/