Height from a Hash

// #Bitcoin

Something perhaps interesting about the blockchain is that we are always talking about the block height, with the genesis block at height 0. But the block height is implicit. It is not explicitly baked into a block header along with the version, previous block hash, transaction merkle root, timestamp, difficulty target, and nonce. And while a block can be re-org out of the canonical blockchain, it will never have its height changed and remain valid. It’s either in at that height, or it’s out.

So maybe the block height could have been added to the header for easier indexing. I am not sure the exact reason it wasn’t, but maybe has to do with keeping non-critical consensus bytes of the blockchain. Plus easier to not have to verify it for each new block.

But then along came BIP-34 in 2013 which was soft-forked in requiring miners to put the block height in the coinbase transaction’s input script.

BIP-34 Side Quest

BIP-34 is a unique piece of bitcoin history. It was one of the first soft forks to follow the BIP process from proposal to implementation to activation. Soft forks had been done before, but in a more slinging it manner.

BIP-34 kept things soft by bringing structure to an open ended requirement, the coinbase transaction. The coinbase transaction has always had all sorts of weird rules since this is how the bitcoin supply is increased. But it is also just another transaction in the block. Well, the first one, but it has the same structure as the rest of the transactions. Which includes inputs. Which when you think about it, is kinda weird cause it is by definition not going to be spending from any previous transaction outputs. It divvy up the implicit block reward!

Even before BIP-34, the were still some restrictions on the coinbase transaction’s inputs. It could only have one input. The input has to spend from a transaction with ID of all zeroes (32 bytes of 0x00). The output index must be 0xFFFFFFFF (value -1 or 4294967295). But the scriptSig could contain any data, just limited by size, which is now 100 bytes. Also the sequence number is not restricted, but less interesting here.

But why have an input at all? The coinbase transaction is special enough, should it just be unique and not have inputs instead of this super standardized one? I dunno if Satoshi ever said why they went with an input, but it does make sense from a standardization format. Every transaction has at least one input. And every coin originates from some input, with (all zeros txid, 0xFFFFFFFF vout) effectively saying “these coins came from the mining reward”.

There is also a practical reason for there to be chunk of data for the miners to mess with. The nonce field in the header is 32 bits, 2^32 possible values, and this can easily be exhausted before finding a valid proof of work. Miners need some more space to change the block hash and the coinbase transaction is a natural spot since it is their transaction to define. Given the restrictions on the coinbase transaction since the beginning, the have two fields to mess with, the sequence number and the scriptSig. But the sequence number is only another four bytes, so generally miners just hop straight to the scriptSig and iterate there.

Finally, even Satoshi thought it there might be reasons to shove arbitrary data onto the blockchain (although you gotta mine a block in this case), placing the famous The Times 03/Jan/2009 Chancellor on brink of second bailout for banks in the very first coinbase transaction’s scriptSig.

And from today’s perspective, it looks like it was a really good call to keep the coinbase transaction transaction-like and open ended. A lot of soft-forks have folded things into it since then, like SegWit in an OP_RETURN output. BIP-34 was just the first, giving a bit of structure to the coinbase transaction’s scriptSig and requiring it to commit to the block height.

Hashes and Heights

Does having this info in a block change anything about how we refer to blocks?

There are three ways to refer to a block. By hash, by height, or by hash and height. Involving a hash narrows it down to one possibility (assuming no crazy hash collision), so more explicit. Combining both allows an application to verify a user is talking about what they think they are talking about. It can verify that yes, the block with this hash is at this height. It can do this by counting up all the blocks since the genesis, but with BIP-34, it might be able to just peak at the coinbase transaction.

Not sure if that would ever be helpful. If you have the header chain, might as well count them. And if you don’t, asking for the coinbase transaction is kind of a big ask, you have to get all the block’s data and parse it. This would only work with post-BIP-34 blocks as well. Seems a little sketch, not sure when it would be useful.

Light Clients

Light clients in general have a use case for only caring about the blockchain past a certain point. Maybe a user knows they received their first bitcoin at address xyz in block 701,123. They want to use a light client to track any activity for the address, and can tell that client, “hey, don’t bother syncing from genesis, just start at block 701,123”.

We talk in block heights, but they introduce ambiguity. Especially if close to the blockchain tip (~6 blocks) where a re-org is more likely to happen.

If a light client wants to expose some sort of “anchor checkpoint” functionality, where it essentially treats that block as the genesis, should it accept a hash, height, or require both? Both is ideal since it could start asking the network for headers after the given hash and it already knows the absolute height. But perhaps a burden on users to pass around both (no idea)? Height has the potential ambiguity so should probably be avoided. What about hash? With just the hash a client can start asking for headers till the chain tip, but then they only know the relative height. They don’t know how many blocks came before the given anchor hash. Unless…we grab some block data and parse the BIP-34 info!

No idea if any demand for this, but kinda cool.