Most bitcoin types have simple, linear encoding formats. Like look at this Header
type.
impl Decodable for Header {
type Decoder = HeaderDecoder;
fn decoder() -> Self::Decoder {
HeaderDecoder(Decoder6::new(
VersionDecoder::new(), // 4 bytes
BlockHashDecoder::new(), // 32 bytes
TxMerkleNodeDecoder::new(), // 32 bytes
BlockTimeDecoder::new(), // 4 bytes
CompactTargetDecoder::new(), // 4 bytes
encoding::ArrayDecoder::new(), // 4 bytes (nonce)
))
}
}
Every byte is known ahead of time.
When you decode a header you know before you read a single byte how many you are going to read in total. And the bytes will always follow a defined, linear, pattern. Four bytes for the version then thirty two for the block hash and so on. And most bitcoin types encoding follows this super simple composable pattern.
If we focus on just the consensus enforced types, so on-chain no p2p stuff, the Block
is the top of the pyramid. It is composed by all the internal types (e.g. headers, versions, transactions). And even the Block
is relatively simple to encode. So does that mean we are totally in the clear? Not quite, cause another beast of a type, the Transaction
, is quite complex. Similar to a Block
is has a dynamic element. There can be a dynamic number of inputs and outputs in a transactions. In bitcoin encoding, these are represented with compact sized number and then that many of the element. While this means there is a variable amount of data, it can still be read in a completely linear manner. Read the length, then read that many elements.
The tricky part of encoding transactions was introduced with SegWit. SegWit is a backwards compatible change, so legacy transactions are still valid. But new SegWit enabled transactions encode the witness in a new spot spot. Instead of a witness per-input, there is now a stack of witnesses after all the inputs and outputs of a transaction. This allows legacy code and new code to agree on a transaction ID, one which does not contain witnesses. And SegWit-aware code knows about the witness ID which also commits to the witness stack. But when you are decoding a transaction, you don’t know ahead of time if it’s SegWit or not.
[version] [input_count][inputs][output_count][outputs] [locktime]
[version][0x00][0x01][input_count][inputs][output_count][outputs][witnesses][locktime]
Legacy vs. SegWit.
Both legacy and SegWit transactions start with a four byte version, but then they diverge. Legacy transactions have a compact size input_count
which can be one to nine bytes. SegWit transactions have instead two bytes, the first is the zero-byte Marker and the second is the Flag. The marker takes advantage of how compact sizes are encoded, they always have a non-zero first byte unless the number is actually zero. And in a consensus (on-chain) context, a transaction always has at least one input. Even coinbase transactions have always been required to have one blank input (although now there are more data requirements there). So just the 5th byte, the marker, is enough to tell if a transaction is SegWit or legacy. The flag byte is a version flag for SegWit and currently is always just one.
This isn’t too big of a difference, but it makes the decoding non-linear. You need to read some bytes, might have to back track, and then go one of two paths. If you read the 5th bit and it is non-zero, you know that you are reading a compact size. But its a bit awkward that you have already “consumed” the byte. Compare that to the simple Header
case above where you can just consume bytes with certainty. This little wrinkle requires some bespoke state handling and I am not sure if there is any way around it. The cost of backwards compatibility.