Rob has a large proposal for the data structure used to hold the blockchain state in kyoto
. While the blockchain in its simplest form is just a long list of blocks, it is not always accessed that way. Especially in those special moments when there is a fork and the network needs to figure out which way to go.
A complicated moment in the life of a young blockchain.
In this example, at block height 5
there are actually two valid block hashes and their history doesn’t tie together until block height 1
. Luckily, the bitcoin protocol usually gets this iron’d out after a few blocks (so maybe an hour or so). But deciding which block headers make up the canonical blockchain is a light clients number one priority. Essentially all other functionality stems from it.
At a high level, the interface to a light client is “Tell me which addresses you care about, and I’ll toss back blocks which contain relevant transactions”, so a filter system. In a simple world of a linear blockchain, the light client hears about a new block, gets its compact filter, checks if a hit, and pulls data from the network (e.g. the transactions) if necessary. But we if the above scenario happens, and a relevant transaction shows up in Block 5C
. Should the client inform the user?
One option would be for a client to just enter some sort of “fork detected” state and not return anything until it clears up. But this isn’t too useful because there is a good chance it doesn’t detect the fork until after returning things from the other branch. Imagine its tossing back info from Block 2A
before it hears about Block 2B
. It still needs to perform some sort of reconciliation with the caller. “Hey man, those things I said about Block 2A
, maybe forget about them”.
Reconciliation is un-avoidable, so the client might as well keep tossing back data until a re-organization occurs. Blocks in competing forks almost always have the same amount of proof of work (unless something weird like on a difficulty adjustment boundary). But clients will hear about the blocks at a different times. What is one’s canonical branch is another’s fork. But at some point they will converge on one branch having more proof of work.
So if kyoto
gets unlucky and starts using a fork, it just needs to switch over to the branch with more work when it hears about it and report the re-organized blocks to the caller. An exceptionally painful scenario might be if the network is split on two forks and they keep getting one block ahead of each other. But the same strategy applies, “these blocks are out, these ones are in!”…just over and over until it settles.
Reconciliation works even if a fork occurs all the way back at the genesis block. The basic idea doesn’t change, which is good, because some of bitcoin’s test networks have some pretty crazy re-orgs. But the algorithm can be tailored to bitcoin’s mainnet where tons of proof of work has solidified hundreds of thousands of blocks. Re-orgs generally happen right at the end near the tips and only for a handful of blocks. One of the bigger ones ever was just four blocks deep.
The bitcoin p2p protocol takes advantage of this heuristic with the locators pattern which was codified by Satoshi back in the day.
The getblocks
message which is very similar to the getheaders
one.
Nodes ask for blocks, or just the headers, from their peers by sending some locator hashes and a stop hash. Technically the locators are optional, an empty list implying that the client knows of no blocks yet. But practically speaking, at least the genesis hash can be hardcoded in a client. But that hints at what the locators are for, the client is saying “I know of these blocks, when you find a hash that you agree with, can you send me the next however many blocks until the stop hash?”. The locator hashes should be ordered newest to oldest, although they are just hashes, so no way for a peer to check that order. But usually it is in both parties interests to keep bandwidth use to a minimum.
With that said, the protocol is flexible as which hashes to send. And there might be certain scenarios which require some weird strategies. But a very solid heuristic is just exponential back off. This takes advantage of the fact that forks generally occur in the more recent block, [1, 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024]
. More recent hashes are dense, while the older ones get sparse.
A client can use this pattern to build up their initial state of the blockchain. It can also use them when it hears of a hash it has know idea what block it builds on. For example, imagine in the young blockchain scenario above that Genesis
instead represents thousands of solidified blocks. The client is chugging along on the A
chain and out of no where it hears of Block 4C
. It hasn’t heard of Block 3C
or Block 2B
yet. If it follows the exponential back off locators pattern, it could turn to the peer and say “Hey I know of Block 4A
, Block 2A
…”. The peer only knows of the C
chain, but they will overlap at some point and it can now sync the C
chain to client.
BlockTree
The heart of the kyoto
refactor is the new BlockTree
type.
pub struct BlockTree {
canonical_hashes: BTreeMap<Height, BlockHash>,
headers: HashMap<BlockHash, BlockNode>,
active_tip: Tip,
candidate_forks: Vec<Tip>,
network: Network,
}
The new BlockTree
type.
The main responsibility of a BlockTree
is to efficiently maintain the canonical blockchain. It allows efficient access by height or hash, but all the complex logic is about adding headers and switching chains if necessary. A BlockTree
is entirely in memory, it doesn’t worry about how it gets its data. That is handled a layer up, kind of the chain manager layer, where things get a bit murky. But data comes from either a trusted persisted source (e.g. the filesystem) or the network (untrusted peers).
A BlockTree
can be initialized on the genesis block (known hardcode) or from any header hash used as a checkpoint. Using a checkpoint doesn’t necessarily require less security, this would just be the portion of the blockchain in memory. This allows a caller to efficiently use the data structure, perhaps they know they are only looking for new blocks, so no need to hold everything in memory.
Chain, Node, and Peers
The Chain
type from the chain
module wraps the BlockTree
, a bit of a middleware between the in-memory blockchain and the user facing Node
, but a more fleshed out interface. It adds persistence of the whole blockchain and what to ask peers for when missing data.
High level modules of kyoto
.