Encoder GATs

// #Rust

Encoders get bytes pulled out of them, and ideally, with zero extra copies of the bytes. Here is a proposal for the pull encoders, the Encoder and Encodable interfaces could maybe be merged into one in some scenarios, but I think its simpler for the Encoder to own its read state. What I want to explore is the cost of the layers, but I think it is possible for it all to be zero-copy.

pub trait Encodable {
    /// The encoder associated with this type. Conceptually, the encoder is like
    /// an iterator which yields byte slices.
    type Encoder: Encoder;
    /// Constructs a "default encoder" for the type.
    fn encoder(&self) -> Self::Encoder;
}

pub trait Encoder {
    /// Yields the next byteslice to be encoded, updating the encoder state.
    ///
    /// Returns `None` if the encoder is exhausted. Once this method returns `None`,
    /// all subsequent calls will return `None`.
    fn advance(&mut self) -> Option<&[u8]>;
    /// Moves the encoder to its previous state (once).
    ///
    /// It is guaranteed that after calling this method once, the next call to
    /// [`Self::advance`] will return the most recent non-`None` value, if any, that
    /// it returned in previous calls.
    ///
    /// No behavior is specified if this method is called multiple times in a row.
    fn unadvance(&mut self);
}

Proposed interface for the pull encoding without a GAT.

Encoder probably doesn’t have to change much. advance takes a mutable self reference so that it can update its state on the call. This could be broken out into two separate calls like in push_decode, but not sure if the trade-off means much. advance just exposes a slice, so doesn’t dictate who owns the bytes.

Encodable is where it gets interesting. This proposal has an associated type for the encoder and the encoder function takes a non-mutable reference ot self to toss back an encoder. &self means that encoder can borrow from self, so I don’t think that half is the issue. But does returning Self::Encoder force the Encoder to own a copy of the data which it is encoding? I don’t think returning &Self::Encoder would help, because that just means the encoder would need to be stored somewhere on the encodable type, and would require a bump up to a mutable self reference.

trait Encodable<'a> {
    type Encoder: Encoder;
    fn encoder(&'a self) -> Self::Encoder;
}

impl<'a> Encodable<'a> for [u8; 32] {
    type Encoder = BytesEncoder<'a>;
    fn encoder(&'a self) -> Self::Encoder {
        BytesEncoder::new(self.as_slice())
    }
}

Add a lifetime parameter?

A lifetime parameter can be added to Encodable, but this is a pretty heavy burden on the user. 'a is always going to be connected to the self reference, but the caller will have to wire this up whenever they use the trait, it bubbles up. So it allows zero-cost encoding and is super flexible, but at a heavy cost, like having to understand things like Higher-Ranked Trait Bounds to try and iron out the ergonomics.

I think this is showing off the benefits that the GAT brings to the table.

trait Encodable {
    type Encoder<'a>: Encoder where Self: 'a;  // Lifetime scoped to method call
    fn encoder(&self) -> Self::Encoder<'_>;
}

Add a lifetime parameter to the associated type, make it a generic associated type (GAT).

The GAT is about keeping (most?) the flexibility, but improving ease of use for the caller. I am not sure it is actually unlocking any new functionality. Instead of a lifetime on the trait, it is on the associated type and is auto-wired up by the encoder method. It is an implementation detail now instead of on the caller.

The Self bound is saying “'a must not outlive Self”. Some type variance. I am not sure if its technically necessary in this case where the encoder method implicitly makes that connection. But I would guess probably best to be explicit, give the compiler as much help as possible for when things get tricky. Another nice feature of GATs is that the implementation can totally ignore the lifetime parameter and just return an owned type if the situation calls for it.

impl Encodable for BlockHash {
    type Encoder<'a> = BytesEncoder<'a> where Self: 'a;
    
    fn encoder(&self) -> Self::Encoder<'_> {
        BytesEncoder::without_length_prefix(self.as_byte_array())
    }
}

Zero-copy encoding using a GAT. This might not be the most compelling use case cause BlockHashes are only 32 bytes, but shows the difference.