Modules

// #Craft #Rust

kyoto is reaching a good spot in functionality, but I am having trouble wrapping my around some of the data flows and internal dependencies. I have some suspicions on stuff I am tripping on which I’ll dive into here.

Opinionated Runtime?

The crates external interface is a node and a client, but I am still struggling to see what the benefit is for the caller to deal with the node. BDK’s bdk-ffi library uses kyoto and has it wrapped up with some CbfNode and CbfClient structs. The CbfNode struct just wraps a node and tosses it in its own tokio runtime, multi-tread variant, on a separate operating system thread. I guess this is maybe a more opinionated approach for callers of bdk-ffi?

If client and node were the only high level tasks exposed by kyoto, I think I could understand the clean break. The caller can decide how to run these two tasks. However, as of kyoto v0.10.0, there are two internal spots which call spawn. They are both in the network module, which makes sense as the I/O heavy spot, peer_map::dispatch and peer::run. spawn implicitly reaches up to the runtime executor to add another root task. This is how concurrency is added to an app using tokio (or any async runtime I guess), without it, everything would just be one big, sequential, state machine. And it probably totally makes sense to do it for these network use cases. But if kyoto is already talking to the runtime, why not just toss the node on its own task too and only give the client to the caller?

There are some complexities if you don’t want to start running a heavy duty task in a builder context. Let’s say the caller is configuring the node, but not ready for it to start processing work yet. If node and client are returned from the builder as separate objects, it is very clear to the caller when the node starts its work because it is on the caller to call node::run(). If the builder only exposes a client, and again the builder shouldn’t start any major background tasks, then the client needs to expose its own client::run() to get things started. Under the hood this could fire a one-shot message which essentially releases the node’s event loop to do its thing. This results in a handful of subtle changes to the caller’s interface. One less task to deal with, but new knowledge that they have to call client::run() to kick things off. Although that could be iron’d out with some intermediate type that consumes itself to produce a client. I am not sure its a net win with the loss of flexibility to put the node task on an optimized runtime.

Keeping the node and client separate allows the client requirements to be simpler. Technically any runtime can drive it since all it exposes is generic async functions. But the node is hardcoded to a tokio runtime since it dynamically spawns new tasks. Maybe this requirement could be hidden with an embedded runtime (e.g. tokio::runtime::Builder::new_multi_thread()) and tossing node on that. It looks like some libraries in the rust space optionally provide this feature, like sqlx. I am not sure if there are heavy memory or weird conflicts possible with running an embedded runtime though.

Scope and the Database

k y o t o c n n d c h e o a l a t d t i i w e a e n o b n / r a t k s / e

High level modules of kyoto, chain and network are the beefy ones.

There is some hand waving here, not listing some smaller modules, but this is how I see the high level modules of kyoto. I have chain and network called out as especially complex modules, each has a bunch of child modules.

  • chain // Validates and holds the state of the blockchain and any forks.
  • network // Manages peers and requesting data from the network.
  • node // Coordinates between chain data, network data, and client requests.
  • database // Persists data between runs.
  • client // User interface for requesting data.

As it stands, there are a lot of connections between these modules. And this is complicated by the super dynamic nature of async code, there are a lot of possible flows depending on runtime things (e.g. when does a network request return). I think ideally, chain and network interfaces are simplified so it’s obvious to any contributors how data flows in and out of them. I haven’t sunk into network yet, but chain is complex enough to be its own crate (not that there is any demand for that). Giving it a scope’d interface, even just for internal use, would make it easier to reason about kyoto as a whole.

Rob’s DETAILS.md notes on the structure are a little out of date now, stuff like filters has been folded into chain, but can see that I am not totally off. But some follow up questions I want to dive into more.

  • Can chain’s interface be made synchronous? If it doesn’t interact directly with the network or database, and offloads that to node then I don’t think there is any need for it to be async and would be simpler to grasp how data flows in and out.
  • network calls must be async, but does it have to be aware of the database or can that be pushed over to node? Can the interface be simplified to data requests?

My thinking is that chain and network may define how their primitives are serialized, but maybe don’t have to deal with the database calls themselves. Let node decide when things are persisted or loaded. For example, on startup node could pull the database and toss it at chain and network, kind of like a “start here”.

Kinda off topic, but sure would be nice to have an MSRV 1.75+ so we could use async in trait functions. Would make defining dependencies way cleaner.