// #Craft
The bip324 library is written in sans-I/O style, which means it doesn’t tie itself to a specific I/O interface. This means it doesn’t block or force async functions. It just handles the shared logic which those two drivers can then use under the hood.
The sans-I/O interface is one of buffer slices. The I/O driver pulls bytes from somewhere (e.g. a network socket), passes them to the sans-I/O library, gets back a new array, pushes those bytes somewhere else. The largest burden of the sans-I/O interface is that there is now a function call where there usually isn’t. If one just assumes an I/O implementation (which is most of us most of the time) this usually all happens in one function.
// Calculate required buffer size for encryption.
pub const fn encryption_buffer_len(plaintext_len: usize) -> usize
// Encrypt into caller-provided buffer.
pub fn encrypt(
&mut self,
plaintext: &[u8],
ciphertext_buffer: &mut [u8],
packet_type: PacketType,
aad: Option<&[u8]>,
) -> Result<(), Error>
The OutboundCipher
interface which encrypts the plaintext. The interface is all buffer slices.
Now technically, sans-I/O does not require the interface implementation to have zero memory allocations. Zero allocations means the library is no_std
, no standard library, compatible which allows it to be used in embedded environments which do not have access to the standard library. For most of the bip324 library, it is just not that big a leap to go from the sans-I/O requirements to the no_std ones. And it is easy to add an allocation wrapper function around the no_std version which maintains the sans-I/O requirement.
But there is one tricky spot. Before the ciphers are completely fired up and a caller is just using the encrypt
/decrypt
operations, BIP-324 requires a handshake to be performed between the peers in order to establish the channel. The handshake, as described back in the Typestate Pattern log, is a non-trivial sequence of steps. Two aspects of handshake, garbage bytes and decoy packets, are used to hide the shape of the traffic. Quite a bit of garbage is allowed to be sent, 4095 bytes. And technically, any amount of decoy packets can be sent!
With the sans-I/O and no_std requirements in mind, the last step of the handshake with these two large, unwieldy memory requirements gets gnarly.
/// Success variants for receive_version.
pub enum HandshakeAuthentication {
/// Successfully completed.
Complete {
cipher: CipherSession,
bytes_consumed: usize,
},
/// Need more data - returns handshake for caller to retry with more ciphertext.
NeedMoreData(Handshake<SentVersion>),
}
impl Handshake<SentVersion> {
/// Authenticate remote peer's garbage, decoy packets, and version packet.
///
/// This method is unique in the handshake process as it requires a **mutable** input buffer
/// to perform in-place decryption operations. The buffer contains everything after the 64
/// byte public key received from the remote peer: optional garbage bytes, garbage terminator,
/// and encrypted packets (decoys and final version packet).
///
/// The input buffer is mutable because the caller generally doesn't care
/// about the decoy and version packets, including allocating memory for them.
///
/// # Parameters
///
/// * `input_buffer` - **Mutable** buffer containing garbage + terminator + encrypted packets.
/// The buffer will be modified during in-place decryption operations.
///
/// # Returns
///
/// * `Complete { cipher, bytes_consumed }` - Handshake succeeded, secure session established.
/// * `NeedMoreData(handshake)` - Insufficient data, retry by extending the buffer.
///
pub fn receive_version(
mut self,
input_buffer: &mut [u8],
) -> Result<HandshakeAuthentication, Error> {}
}
One version of the receive_version step which operates on a mutable buffer.
To complete the handshake, the local peer needs to receive all the garbage bytes by reading up to 4096 bytes in search of the previously agree’d upon garbage terminator. It then needs to keep reading any amount of any sized decoy packets until it finds the version packet which is used to negotiate any future upgrades of the channel. Also, it needs to authenticate those previously read garbage bytes with the first packet it reads, no matter if that packet is a decoy or the version packet.
Now here’s the thing. If we only supported the sans-I/O requirements and not the no_std ones, we could simply split this into two steps. Ask the caller to give a non-mutable buffer that they think contains the garbage and the terminator. If the function finds the terminator, it makes a copy of the garbage bytes since they are required in the next step (authenticated with the first packet). This is an allocation! Next, the caller is actually just decrypting packets, although that higher level interface can’t be exposed directly since some future version negotiation might need to happen here. But it does make stuff easier like returning the exact number of bytes required for future decryption. Either the rest of the length header or the ciphertext if the length has already been decrypted.
Are there ways to avoid the garbage allocation in order to keep it no_std? We could just put a 4095 byte array on the stack, but that is not very no_std friendly which defeats the whole point.
Another way is what we have above, the receive_version
function is designed to be “re-called” with the caller extending the input_buffer
if asked. The garbage is (re)found on every call. This attempts to keep the interface as simple as possible for the caller, but it is still tricky.
Perhaps the operation should be split into two steps. The first attempts to find the garbage and the second focuses on the decoys and version. This increases the size of the API, but it potentially simplifies both steps.
The single-function-extend-input and dual-function approaches share a pain point where a buffer is over-read in search of the all the bytes. That is why some sort of cytes_consumed
must be returned to the caller, so that they can “reset” the input buffer. For the single function, this happens at the very end. For the dual function, this happens only after receiving garbage because after that the channel is encrypted and has length headers.
A dual function approach could go with hanging on to a garbage reference between steps so the caller doesn’t have to manage that directly.
impl Handshake<SentVersion> {
// Returns immutable ref to garbage, but a lifetime is tieing things together.
pub fn receive_garbage<'a>(self, input_buffer: &'a [u8])
-> Result<(Handshake<ReceivedGarbage<'a>>, &'a [u8]), Error>
// Needs mutable ref for in-place decryption.
pub fn receive_version(self, input_buffer: &mut [u8])
-> Result<Handshake<Completed>, Error>
}
A new receive_garbage step which returns any non-consumed buffer?
I see a lifetime issue here though. receive_garbage
wants to hang onto the found garbage so it can be authenticated in the next step. It is also returning the un-consumed part of the buffer with the same lifetime. That’s the buffer which has to be passed to the next step, it contains at least the first part of the decoys and version packet.
The receive_version
step could switch back to the more conservative “decrypt into a new buffer” function instead of decrypting in place. This would keep all references to the buffer as immutable. But does place a kind of unnecessary burden on the caller to allocate memory for packets which they don’t care about. That appears to be the tradeoff though, burden the caller with memory allocation vs. asking them to pass back the found garbage slice.
impl Handshake<SentVersion> {
pub fn receive_garbage<'a>(self, input_buffer: &'a [u8])
-> Result<(Handshake<ReceivedGarbage<'a>>, &'a [u8]), Error>
pub fn receive_version(self, input_buffer: &[u8], output_buffer: &mut [u8])
-> Result<Handshake<Completed>, Error>
}
An output buffer might be required to satisfy lifetime safety, but maybe better to just ask the caller to copy the un-consumed bytes to a new buffer?
I might be over thinking it with the second element of the tuple in the garbage return:(Handshake<ReceivedGarbage<'a>>, &'a [u8])
. Theoretically the Handshake<ReceivedGarbage<'a>>
type could just expose the length of the captured garbage and the caller could perform their buffer management. The second element is just helpful since buffer management is unavoidable, unless they guess they exact number of bytes of garbage. Which does happen to be easier today since not a lot of implementations send garbage, but probably not something to bank on. So accepting the caller needs to manage a buffer, what is the most helpful thing to return right away? It might just be a bytes consumed usize
. If they want a slice the can very easily re-slice their buffer with that value. The usize is simpler than jumping straight to another reference to manage.