Filter Syncing

Kyoto Series

The next step in the BlockTree refactor is folding the compact block filters into the blockchain data structure.

BIP-157

BIP-157 defines the primitives for compact block filters.

FilterHash // A double SHA-256 hash of the raw contents of a compact block filter. A commitment to the filter.
FilterHeader // A double SHA-256 hash of the FilterHash and the previous FilterHeader. Chains the filters together to create a verifiable sequence. Forces a peer to commit ahead of time to all block filters.

There is a filter header and filter hash for each block. When a client asks the network for a filter, it can verify the filter against the hash, and it can trust that hash (to a certain extent) because it verified it against the header.

So the goal is to have BlockTree manage not only the blockchain state, but also the corresponding block filter header chain state. While the blockchain has the simple “most work” heuristic, the block filter header chain is a more hand-wavy, what is the network feeling? The client has to choose a block filter header chain for its chosen blockchain.

One way to do this is just ask a bunch of peers for their filter header chains. As long as some number agree, go with that. Actually validating a filter, and then the filter’s header, requires downloading all the block’s transactions from the network. This is a relatively high cost for a light client, but could be used as an ultimate backstop.

Updating the Graph

The refactor starts at the very bottom, the BlockNode struct of the BlockTree is extended with filter_commitment and filter_checked fields. The FilterCommitment type is the natural pairing of a filter hash and a filter header for a block. The assumption here is that the commitment has been validated somewhere upstream and is safe to associate with the block.

The filter_checked flag is a little more interesting. It is flipped to the true after the client pulls a filter from the network, verifies it against its chosen filter header chain, and marks as checked. It doesn’t matter at this point if the filter is a hit or not for anything the user cares about, that is handled separately. A client could choose to store that filter itself here, in case the user needs to recheck it at any point. However, this would dramatically increase storage costs (an order of magnitude or two). So kyoto opts to download a block’s data on a match, toss that back to the user, and just check off that the work is done.

I wondered if it would be worth validating a block filter at this point. The client has already gathered all the data inputs to create the filter, the block data it is returning to the user. It sounds like creating a filter is not negligible amount of work, so might still be too expensive for a light client, but maybe? Well, forgot a part of BIP-158 here. BIP-158 describes how filters are created and one of the inputs is the scriptPubKey’s associated with an input’s outpoint (previous transaction plus output index, txid + vout, like a light UTXO). This makes sense, this is how a light client would pick up that one of its addresses is being spent without any knowledge of the chain (i.e. an outpoint).

A Filter Header Chain

So about that assumption mentioned above. How does kyoto choose its header chain? This is the complex part of the process.

For now, this lives in the chain module of the chain module, bit of name nest, but maybe a better name for the child module is manager or orchestrator since it moves along the state. It is handling creating and digesting messages from some data sources, most likely the network. Still, from a high level, the chain module is still pulling data from the network module. The GetCFHeaders message defined in BIP-157 shows what is happening at a low-level.

[1 byte]   filter_type   // Usually 0x00 for the "basic" filter type
[4 bytes]  start_height  // Little-endian uint32
[32 bytes] stop_hash     // Hash of the last block in the range

The GetCFHeaders message on the wire.

Why mix heights and hashes? Hashes are more exact, no ambiguity if there is some fork action going on. It allows light clients to ask for headers even if they don’t know the exact height yet of a block they just learned about. But heights are used to index known blocks, the “right” (a.k.a. “old”) side of the blockchain is set in stone. So the asymmetric approach fits the most common use case, syncing the most recent few blocks.

This large refactor has the chain module creating a FilterRequestState, asking the network module which asks a bunch of peers, and tosses the responses back to chain. chain validates the first response and then counts up any matching requests. Once it hits some threshold (e.g. 3 responses agree), it adds them to the BlockTree state. If there are any conflicting responses, it kills the request and tries again.

This agreement processing is necessary since there is no heuristic just baked into the chain. What isn’t clear to me is if the processing should be in chain or network. In this case, there might be some benefit to having network, which is aware of peer reputations, decide what is a worthy response. On the other hand, the chain module will always have to validate the filter header chain no matter what, like it does now in sync_cf_headers. Right now the higher level node module has a binary ban or no-ban system for peers which send conflicting filter header chains.

Here is the state machine that chain (and kind of the higher level node) pushes along.

pub(crate) struct FilterRequestState {
    pub last_filter_request: Option<FilterRequest>,
    pub last_filter_header_request: Option<FilterHeaderRequest>,
    pub pending_batch: Option<CFHeaderBatch>,
    pub agreement_state: FilterHeaderAgreements,
}

#[derive(Debug, Clone, Copy)]
pub(crate) struct FilterRequest {
    #[allow(unused)]
    pub start_height: u32,
    pub stop_hash: BlockHash,
}

#[derive(Debug, Clone, Copy)]
pub(crate) struct FilterHeaderRequest {
    pub start_height: u32,
    pub stop_hash: BlockHash,
    pub expected_prev_filter_header: Option<FilterHeader>,
}

#[derive(Debug, Clone, Copy, PartialEq, Eq)]
pub(crate) struct FilterHeaderAgreements {
    current: u8,
    required: u8,
}

Types for the state.

The chain is continually going through three phases. First, blocks are synced, a canonical chain is chosen. Next, filter headers are synced, another chain is chose. Finally filters themselves are checked for if a block contains address activity.

https://github.com/rustaceanrob/kyoto/pull/330