Nostr

// Censorship resistant speech

Operator! Get me the president of the world!

Nostr (Notes and Other Stuff Transmitted by Relay) is a protocol attempting to share messages in a decentralized, censorship-resistant fashion.

It’s very common in the web 2.0 world for users to have an account @service. They authenticate with that service’s server and trust them with their identity and data. There are a lot of usability benefits to this “dumb client/smart server” paradigm. But taking a step back and looking at the landscape today, there appears to be some sort of feedback loop here which leads to a few, super power silos of data. This centralized power gets abused.

Nostr is trying to break this feedback loop by flipping the paradigm. Instead of being granted an identity by a service, the user owns their identity, they are sovereign. Nostr brings back some smarts to the client by forcing clients to sign all chunks of a user’s data with their cryptographic key. It might be a relatively small tweak to the web of today, but the theory is that this could snowball into a decentralized ecosystem. That ecosystem is made up of identities, events and relays. Nostr is not attempting to create consensus on the events like bitcoin. No part of the protocol dictates that the relay services talk to each other, they are relatively dumb. But hopefully the purposefully simple protocol has just enough structure to facilitate this paradigm flip.

Simple might not be the best way to describe the ecosystem. Perhaps locked-in? The hard part of the nostr vision is returning identity to the user. Getting people to actually use cryptographic keys has historically been a tough sell. So I think the strategy is to purposefully locked-in all other parts of the system (e.g. only supporting websocket transport) to bootstrap a network effect. If we can get people to adopt keys for identity, the other rules could probably be loosened up.

Identity

Identity in nostr is just public/private key pair. Clients create messages and sign them with their private key and then post them to relays. What are the goals here?

Nostr uses the Schnorr signatures like bitcoin. By convention, the keys are usually tossed around in bech32 formatted strings to help us humans. This convention is defined in NIP-19. The NIP includes the example the hex public key 3bf0c63fcb93463407af97a5e5ee64fa883d107ef9e558472c4eb9aaaefa459d translates to npub180cvv07tjdrrgpa0j7j7tmnyl2yr6yr7l8j4s3evf6u64th6gkwsyjh6w6. Like bitcoin, the plaintext prefix can dictate what information is stored in the string. npub is a user’s public key.

web+nostr // NIP-21

So you want to share your nostr public key somewhere so people can message you? NIP-21 defines a URI scheme for apps to use, pretty straight forward with nostr:. I am seeing web+nostr out in the wild, but not sure why the web+ is being added in this case. It is mentioned in the html spec for URIs, “Effectively namespaces web-based protocols from other, potentially less web-secure, protocols.”.

Events

Events are the only object in the nostr system. Their structure is defined in NIP-01.

{
  "id": <32-bytes lowercase hex-encoded sha256 of the serialized event data>,
  "pubkey": <32-bytes lowercase hex-encoded public key of the event creator>,
  "created_at": <unix timestamp in seconds>,
  "kind": <integer between 0 and 65535>,
  "tags": [
    [<arbitrary string>...],
    // ...
  ],
  "content": <arbitrary string>,
  "sig": <64-bytes lowercase hex of the signature of the sha256 hash of the serialized event data, which is the same as the "id" field>
}

Required event fields.

id

Since this system is designed to be extremely decentralized, with many different implements, it is important that events can only be serialized one-way else interoperability goes out the window. So how do we layout these bits to then hash them and get the id?

[
  0,
  <pubkey, as a lowercase hex string>,
  <created_at, as a number>,
  <kind, as a number>,
  <tags, as an array of arrays of non-null strings>,
  <content, as a string>
]

ID hash serialization layout.

There are very specific rules for this layout so that it is deterministic, including that it is a UTF-8 JSON-serialized string. So the bits can only land in one way. I am not sure yet why the 0 is used in place of the still-to-be-calculated id. Maybe just to be clear about the use case of this layout? pubkey, created_at, and kind each have a fixed size so you know when the tags field starts. You also know when it ends since the last string is the content.

Once the id, aka hash commitment of the event, is found it is used to create a signature for the event. This is the classic “hey, the owner of this secret private key signed off on this stuff” pattern.

kind

The kind specifies how to interpret fields of the event. It is possible for a tag to mean totally different things based on the kind, so it is sketchy to attempt to define an event just based on its shape (ducktyping).

There are kind ranges which defined in NIP-01 with some extra characteristics.

Regular // Regular events stored by relays 1000 <= n < 10000 || 4 <= n < 45 || n == 1 || n == 2.
Replaceable // For a combination of pubkey and kind, only the latest is held on to 10000 <= n < 20000 || n == 0 || n == 3.
Ephemeral // Not held onto at all 20000 <= n < 30000.
Addresssable // For a combination of pubkey, key, and d-tag, only the latest is held on to 30000 <= n < 40000.

NIP-10 defines the omnipresent kind 1.

Relays

Relays relay events between clients. And that is it!

Relays hold onto events posted by clients. They also dole them out to clients who ask for them.So, a store and forward relay. NIP-01 defines the bare minimum interface of a nostr relay.

Accept WebSocket connections.
Receive events and validate them (check signatures and IDs).
Store valid events.
Accept subscriptions (filters).
Send matching events to subscribers.

Maybe the one not dead-obvious design decision here is why only WebSockets? What are WebSockets?

websockets

The WebSocket protocol was introduced with HTML5 around 2011 and the goal was to make communication on the web more performant by adding state to the connection. The original high level protocol, HTTP, is state-less with each connection being a single request and reponse between parties. When a WebSocket connection is established, it is persistent and creates a two-way channel between the parties. They can then just fire messages at each other without establishing a new connection each time.

WebSocket connections are still built on TCP/IP, they are not re-inventing the stack. WebSockets were actually designed to use as much of the existing networking stack as possible, even HTTP itself. The WebSocket protocol begins with an HTTP request instead of some custom handshake. This allows it to still work with a bunch of HTTP-only infrastructure out there like proxies and firewalls. The connection is then “upgraded” to a WebSocket if both parties support it.

The protocol a layer down, TCP, doesn’t actually care if the bytes it is sending are HTTP, WebSocket, or whatever else. It is up to the client and server to do the HTTP handshake over a TCP connection. They then collectively decide “from this exact byte in the TCP stream forward, we’re going to interpret everything as WebSocket frames instead of HTTP!”. This way they are still on the same page, but interpreting the bytes differently.

Sounds good, but should we be using this everywhere? Is it that powerful? Maybe helpful to break down the timeline of post-HTTP v1.0 protocols and their characteristics.

# First request
Client -> Server: SYN
Server -> Client: SYN-ACK
Client -> Server: ACK
Client -> Server: HTTP Request
Server -> Client: HTTP Response
[Connection closed]

# Second request
Client -> Server: SYN
Server -> Client: SYN-ACK
Client -> Server: ACK
Client -> Server: HTTP Request
Server -> Client: HTTP Response
[Connection closed]

The first version of HTTP, HTTP/1.0, where every request is a connection.

Connection keep-alive was introduced in HTTP version 1.1, published in 1997 and implemented a few years later (RFC 2068 and RFC 2616). This allowed requests to share a TCP connection for at least a little bit.

# First request
Client -> Server: SYN
Server -> Client: SYN-ACK
Client -> Server: ACK
Client -> Server: HTTP Request
Server -> Client: HTTP Response
[Connection stays open]

# Second request
Client -> Server: HTTP Request
Server -> Client: HTTP Response
[Connection stays open]

HTTP/1.1 with the TCP connection maintained with keep-alive.

keep-alive brought the connection overhead way down, but still required long-polling patterns by clients in order to pull down new data and/or actually keep the connection alive. This is where WebSockets entered the picture (2011) and they make the next logical to push data both ways in the persistent connection.

With that said, HTTP/2 showed up in 2015 and has a very similar feature set to WebSockets. And HTTP/3 has also arrived on the scene in 2022 (RFC 9114). HTTP/3 is a massive departure from earlier version since it removes the underlying TCP in favor of a new fancy UDP based protocl, QUIC (Quick UDP Internet Connections). This modification of the network stack is a way bigger migration lift then a dynamic connection upgrade. And while we are here, WebTransport showed up in 2023 and is the natural succesor to WebSockets, but built on HTTP/3 instead of HTTP/1.1.

Early nostr development started around 2021 and gives some insight into why its rolled with WebSockets. HTTP/3 was very fresh, not even finalized, so could not be considered as a base protocol. WebSockets on the other hand were tried and true. That just leaves why not HTTP/2? At its heart, HTTP/2 is still request/response optimized whereas WebSockets are a bi-direction stream. Given the nature of the client/relay relationship in nostr, WebSockets were the clear fit.

There are some challenges to WebSockets which are relevant to the nostr use case. Infrastructure like load balancers and proxies was originally developed with stateless HTTP in mind. Maintaining a connection complicates things, but isn’t much of an issue these days with modern proxies developing neat ways to support WebSockets. Probably a bigger issue for the nostr use case is the resources required for persistent connections. If a relay has hundreds or thousands of clients subscribed to it, that is a lot of parallel connections. Same for a client attempt to post to a ton of relays. But I believe there are ways to mitigate these resource requirements by leveraging the simplicity of the nostr topology.

The nostr ecosystem does not fundamentally depend on any aspect of websockets, it could work with just old school HTTP. Decentralization comes from the sovereign identity and data. Locking nostr to websockets allows any nostr client to connect to any nostr relay, no feature handshake negotiation required, bootstrapping a network effect with immediate interoperability. We want to get to a fully fledged sovereign identity ecosystem as quick as possible, so let’s just lock in a reasonable transport.

topology

The extreme simplicity of the nostr topology is key to its power. Clients talk to relays, relays do not talk to each other. If a client wants wider distribution, it is on them to connect to more relays.

One can quickly think of many use cases which are easier to implement if relays could talk to other relays. However, it introduces a boatload of terribly complex failure scenarios we see in the federated systems.

Relay consensus.
Event forwarding loops.
Relay relationships.

There is nothing stopping someone from running a client which simply reads off one relays and posts to another, but it isn’t codified in any way in the protocol.