Kix’s push_decode
library provides little I/O driver wrappers for the encoders and decoders it produces. Since the coders are sans-io, they need to have bytes pushed and pulled through them. For an encoder, you pull bytes out of it and push them into a sink (e.g. a TCP stream). The encoder is coding a type into some bytes, but doesn’t know where to put them. That is up to the driver. A caller could do this by hand, which shows off the flexibility of the sans-io interface, but it’s pretty straight forward to wire together some common runtimes like the standard library’s blocking I/O or async tokio.
With an encoder, the caller probably always wants to use some sort of buffered sink. Encoders generally produce many small chunks of bytes and it would be inefficient for each chunk to be a system call. But from the driver’s perspective, the write interface is the same no matter if a buffer is used or not.
The decoding half is more complex. The driver pulls bytes from a source and pushes them through the decoder. At some point, a type is produced out the other end. In the encoding scenario, the driver can just keep asking the encoder for bytes until it signals that it is done. And it is easy to make this very efficient, the bytes can probably just be a slice, a reference, and they might even reference the internal fields of the type. A zero-copy solution. Now back to decoding, the first step for the driver is to pull some bytes from the source. But how many? The thing that knows how many bytes it needs is the decoder.
And it is a little more complicated. A decoder might have a general idea of how many bytes it needs for the type it produces, say a bitcoin Transaction
. But a Transaction
can be dynamically size, many inputs and outputs, and the decoder won’t know the exact amount of bytes until it reads them all. So decoders usually know the minimum number of bytes needed to know the next minimum number of bytes to decode.
Now let’s layer on the usual I/O read interface interface. What is a driver to do? How many bytes should they allocated for this task? I believe a general driver (as in, not purpose built for a specific use case) has two options.
- Require a
BufRead
implementation. - Require a
Read
implementation and just guess. But this can be improved by asking the decoder for some hints.
BufRead
Similar to buffering on the sink side, there are probably tons of use cases which would benefit by soaking up some system calls on the source side with a buffer. The second benefit of a BufRead
is that the interface is pretty ideal to drive the decoder, the buffer handles all the allocation bookkeeping.
/// Synchronously decodes a value from the given reader using a custom decoder.
#[cfg(feature = "std")]
pub fn decode_sync_with<D: Decoder, R: std::io::BufRead + ?Sized>(reader: &mut R, mut decoder: D) -> Result<D::Value, ReadError<D::Error>> {
loop {
let buf = match reader.fill_buf() {
Ok(buf) => buf,
Err(error) if error.kind() == std::io::ErrorKind::Interrupted => continue,
Err(error) => return Err(ReadError::Read(error)),
};
if buf.is_empty() {
break decoder.end().map_err(ReadError::Decode);
}
let num = decoder.bytes_received(buf).map_err(ReadError::Decode)?;
let buf_len = buf.len();
reader.consume(num);
if num < buf_len {
break decoder.end().map_err(ReadError::Decode);
}
}
}
A push_decode driver requiring a BufRead
implementation.
An extremely clean implementation. But unlike the buffered sink, the buffered source has a different interface. See above the full_buf
and consume
methods. This code very much needs this specialized interface, so now it’s on the caller to provide a source which is exposed like this. For lots of types, like in memory arrays, this is already blanket implemented. And other sources like a TCPStream
can be wrapped with a BufReader
.
This might still be too large of an ask for some callers. The ecosystem is largely centered on the Read
interface, not the BufRead
one. Accepting the more general Read
would be ideal, but I am not sure there is a simple way to do that and only have one interface. For example, you can’t just auto wrap everything in a BufReader
under the hood. That may end up with double buffers.
I don’t think there is a simple, performant way to accept the general Read
trait as the arg and then dynamically use the BufRead
interface if the type supports it. Which isn’t too surprising, sounds very dynamic-y.
I imagine there are scenarios where you just want to manage the buffer by hand, if super memory constrained. I don’t believe BufRead
is in the core library, so is not no_std
friendly.
So all in all, there might just have to be two decoding interfaces.
Clamping
This is the strategy push_decode
takes, exposing a decode_sync_unbuffered_with
function along side the buffered decode_sync_with
version above.
decode_sync_with<D: Decoder, R: std::io::BufRead + ?Sized>(reader: &mut R, mut decoder: D)
decode_sync_unbuffered_with<const BUF_LEN: usize, D: KnownMinLenDecoder, R: std::io::Read + ?Sized>(reader: &mut R, decoder: D)
Signature comparison of buffered vs. non-buffered.
The un-buffered version (un-buffered as in, it does not require a buffered reader since it will be doing the buffering) is still taking an IO reader source and a decoder as input, however the requirements have been tweaked. The reader is only bound on Read
, no more BufRead
, lowering its requirements. The decoder has its bar raised, it now needs to be a KnownMinLenDecoder
. There is also the BUF_LEN
generic parameter, we will get into that later.
The KnownMinLenDecoder
requirement is what allows the driver to ask the decoder, hey, how many more bytes do you need to make your next decision? Without this information the driver needs to first allocate a chunk of memory and then pass the whole thing to Read
to get some bytes. But with the known min length bit of info, the driver can clamp the allocated memory, only pass a small slice to the Read
so that quickly fills the exact amount needed.
That still leaves the question, how much memory should be allocated in the first place to be the buffer? Where should it be allocated? This leads us to the new BUF_LEN
const generic. This is unsurprising the parameter which controls the buffer size, but why use a generic? This allows the buffer to be a stack allocation (size known at compile time) instead of on the heap.
I can’t confidently say I know all the trade-offs of stack vs. heap allocation for this scenario. However, I have to imagine the compile-time nature of the stack allows the compiler to further optimize it. At the cost of requiring another arg from the caller who might not have any idea what to put here.