The next step in the BlockTree
refactor is folding the compact block filters into the blockchain data structure.
BIP-157
BIP-157 defines the primitives for compact block filters.
- FilterHash // A double SHA-256 hash of the raw contents of a compact block filter. A commitment to the filter.
- FilterHeader // A double SHA-256 hash of the
FilterHash
and the previousFilterHeader
. Chains the filters together to create a verifiable sequence. Forces a peer to commit ahead of time to all block filters.
There is a filter header and hash for each block. When a client asks the network for a filter, it can verify the filter against the hash, and it can trust that hash (to a certain extent) because it verified it against the header.
So the goal is to have BlockTree
manage not only the blockchain state, but also the corresponding block filter header chain state. While the blockchain has the simple “most work” heuristic, the block filter header chain is a more hand-wavy, what is the network feeling? The client has to choose a block filter header chain for its chosen blockchain.
One way to do this is just ask a bunch of peers for their filter header chains. As long as some number agree, go with that. Actually validating a filter, and then the filter’s header, requires downloading all the block’s transactions from the network. This is a relatively high cost for a light client, but could be used as an ultimate backstop.
Updating the Graph
The refactor starts at the very bottom, the BlockNode
struct of the BlockTree
is extended with filter_commitment
and filter_checked
fields. The FilterCommitment
type is the natural pairing of a filter hash and a filter header for a block. The assumption here is that the commitment has been validated somewhere upstream and is safe to associate with the block.
The filter_checked
flag is a little more interesting. It is flipped to the true
after the client pulls a filter from the network, verifies it against its chosen filter header chain, and marks as checked. It doesn’t matter at this point if the filter is a hit or not for anything the user cares about, that is handled separately. A client could choose to store that filter itself here, in case the user needs to recheck it at any point. However, this would dramatically increase storage costs (an order of magnitude or two). So kyoto
opts to download a block’s data on a match, toss that back to the user, and just check off that the work is done.
I wondered if it would be worth validating a block filter at this point. The client has already gathered all the data inputs to create the filter, the block data it is returning to the user. It sounds like creating a filter is not negligible amount of work, so might still be too expensive for a light client, but maybe? Well, forgot a part of BIP-158 here. BIP-158 describes how filters are created and one of the inputs is the scriptPubKey
’s associated with an input’s outpoint (previous transaction plus output index, txid + vout, like a light UTXO). This makes sense, this is how a light client would pick up that one of its addresses is being spent without any knowledge of the chain (i.e. an outpoint).
A Filter Header Chain
So about that assumption mentioned above. How does kyoto
choose its header chain? This is the complex part of the process.
For now, this lives in the chain
module of the chain
module, bit of name nest, but maybe a better name for the child module is manager
or orchestrator
since it moves along the state. It is handling creating and digesting messages from some data sources, most likely the network. Still, from a high level, the chain
module is still pulling data from the network
module. The GetCFHeaders
message defined in BIP-157 shows what is happening at a low-level.
[1 byte] filter_type // Usually 0x00 for the "basic" filter type
[4 bytes] start_height // Little-endian uint32
[32 bytes] stop_hash // Hash of the last block in the range
The GetCFHeaders message on the wire.
Why mix heights and hashes? Hashes are more exact, no ambiguity if there is some fork action going on. It allows light clients to ask for headers even if they don’t know the exact height yet of a block they just learned about. But heights are used to index known blocks, the “right” (a.k.a. “old”) side of the blockchain is set in stone. So the asymmetric approach fits the most common use case, syncing the most recent few blocks.
This large refactor has the chain
module creating a FilterRequestState
, asking the network
module which asks a bunch of peers, and tosses the responses back to chain
. chain
validates the first response and then counts up any matching requests. Once it hits some threshold (e.g. 3 responses agree), it adds them to the BlockTree
state. If there are any conflicting responses, it kills the request and tries again.
This agreement processing is necessary since there is no heuristic just baked into the chain. What isn’t clear to me is if the processing should be in chain
or network
. In this case, there might be some benefit to having network
, which is aware of peer reputations, decide what is a worthy response. On the other hand, the chain
module will always have to validate the filter header chain no matter what, like it does now in sync_cf_headers
. Right now the higher level node
module has a binary ban or no-ban system for peers which send conflicting filter header chains.
Here is the state machine that chain
(and kind of the higher level node
) pushes along.
pub(crate) struct FilterRequestState {
pub last_filter_request: Option<FilterRequest>,
pub last_filter_header_request: Option<FilterHeaderRequest>,
pub pending_batch: Option<CFHeaderBatch>,
pub agreement_state: FilterHeaderAgreements,
}
#[derive(Debug, Clone, Copy)]
pub(crate) struct FilterRequest {
#[allow(unused)]
pub start_height: u32,
pub stop_hash: BlockHash,
}
#[derive(Debug, Clone, Copy)]
pub(crate) struct FilterHeaderRequest {
pub start_height: u32,
pub stop_hash: BlockHash,
pub expected_prev_filter_header: Option<FilterHeader>,
}
#[derive(Debug, Clone, Copy, PartialEq, Eq)]
pub(crate) struct FilterHeaderAgreements {
current: u8,
required: u8,
}
Types for the state.
The chain
is continually going through three phases. First, blocks are synced, a canonical chain is chosen. Next, filter headers are synced, another chain is chose. Finally filters themselves are checked for if a block contains address activity.