Nostr

// Censorship resistant speech

Operator! Get me the president of the world!

Nostr (Notes and Other Stuff Transmitted by Relay) is a protocol attempting to share messages in a decentralized, censorship-resistant fashion. The system is made up of events and relays.

Events

Events are the only object in the nostr system. Their structure is defined in NIP-01.

{
  "id": <32-bytes lowercase hex-encoded sha256 of the serialized event data>,
  "pubkey": <32-bytes lowercase hex-encoded public key of the event creator>,
  "created_at": <unix timestamp in seconds>,
  "kind": <integer between 0 and 65535>,
  "tags": [
    [<arbitrary string>...],
    // ...
  ],
  "content": <arbitrary string>,
  "sig": <64-bytes lowercase hex of the signature of the sha256 hash of the serialized event data, which is the same as the "id" field>
}  

Required event fields.

id

Since this system is designed to be extremely decentralized, with many different implements, it is important that events can only be serialized one-way else interoperability goes out the window. So how do we layout these bits to then hash them and get the id?

[
  0,
  <pubkey, as a lowercase hex string>,
  <created_at, as a number>,
  <kind, as a number>,
  <tags, as an array of arrays of non-null strings>,
  <content, as a string>
]  

ID hash serialization layout.

There are very specific rules for this layout so that it is deterministic, including that it is a UTF-8 JSON-serialized string. So the bits can only land in one way. I am not sure yet why the 0 is used in place of the still-to-be-calculated id. Maybe just to be clear about the use case of this layout? pubkey, created_at, and kind each have a fixed size so you know when the tags field starts. You also know when it ends since the last string is the content.

Once the id, a.k.a. hash commitment of the event, is found it is used to create a signature for the event. This is the classic “hey, the owner of this secret private key signed off on this stuff” pattern.

kind

The kind specifies how fields of the event are interpreted. It is possible for a tag to mean totally different things based on the kind, so it is sketchy to attempt to define an event just based on its shape ala ducktyping.

There are kind ranges which defined in NIP-01 with some extra characteristics.

  • Regular – Regular events stored by relays 1000 <= n < 10000 || 4 <= n < 45 || n == 1 || n == 2.
  • Replaceable – For a combination of pubkey and kind, only the latest is held on to 10000 <= n < 20000 || n == 0 || n == 3.
  • Ephemeral – Not held onto at all 20000 <= n < 30000.
  • Addresssable – For a combination of pubkey, key, and d-tag, only the latest is held on to 30000 <= n < 40000.

NIP-10 defines the omnipresent kind 1.

Relays

Relays relay events between clients. And that is it!

Relays hold onto events posted by clients. They also dole them out to clients who ask for them.So, a store and forward relay. NIP-01 defines the bare minimum interface of a nostr relay.

  1. Accept WebSocket connections.
  2. Receive events and validate them (check signatures and IDs).
  3. Store valid events.
  4. Accept subscriptions (filters).
  5. Send matching events to subscribers.

Maybe the one not dead-obvious design decision here is why only WebSockets? What are WebSockets?

websockets

The WebSocket protocol was introduced with HTML5 around 2011 and the goal was to make communication on the web more performant by adding state to the connection. The original high level protocol, HTTP, is state-less with each connection being a single request and reponse between parties. When a WebSocket connection is established, it is persistent and creates a two-way channel between the parties. They can then just fire messages at each other without establishing a new connection each time.

WebSocket connections are still built on TCP/IP, they are not re-inventing the stack. WebSockets were actually designed to use as much of the existing networking stack as possible, even HTTP itself. The WebSocket protocol begins with an HTTP request instead of some custom handshake. This allows it to still work with a bunch of HTTP-only infrastructure out there like proxies and firewalls. The connection is then “upgraded” to a WebSocket if both parties support it.

The protocol a layer down, TCP, doesn’t actually care if the bytes it is sending are HTTP, WebSocket, or whatever else. It is up to the client and server to do the HTTP handshake over a TCP connection and then collectively decide “from this exact byte in the TCP stream forward, we’re going to interpret everything as WebSocket frames instead of HTTP!”. This way they are still on the same page, but interpreting the bytes differently.

Sounds good, but should we be using this everywhere? Is it that powerful? Maybe helpful to breakdown the timeline of post-HTTP v1.0 protocols and their characteristics.

# First request
Client -> Server: SYN
Server -> Client: SYN-ACK
Client -> Server: ACK
Client -> Server: HTTP Request
Server -> Client: HTTP Response
[Connection closed]

# Second request
Client -> Server: SYN
Server -> Client: SYN-ACK
Client -> Server: ACK
Client -> Server: HTTP Request
Server -> Client: HTTP Response
[Connection closed]  

The first version of HTTP, HTTP/1.0, where every request is a connection.

Connection keep-alive was introduced in HTTP version 1.1, published in 1997 and implemented a few years later (RFC 2068 and RFC 2616). This allowed requests to share a TCP connection for at least a little bit.

# First request
Client -> Server: SYN
Server -> Client: SYN-ACK
Client -> Server: ACK
Client -> Server: HTTP Request
Server -> Client: HTTP Response
[Connection stays open]

# Second request
Client -> Server: HTTP Request
Server -> Client: HTTP Response
[Connection stays open]  

HTTP/1.1 with the TCP connection maintained with keep-alive.

keep-alive brought the connection overhead way down, but still required long-polling patterns by clients in order to pull down new data and/or actually keep the connection alive. This is where WebSockets entered the picture (2011) and they make the next logical to push data both ways in the persistent connection.

With that said, HTTP/2 showed up in 2015 and has a very similar feature set to WebSockets. And HTTP/3 has also arrived on the scene in 2022 (RFC 9114). HTTP/3 is a massive departure from earlier version since it removes the underlying TCP in favor of a new fancy UDP based protocl, QUIC (Quick UDP Internet Connections). This modification of the network stack is a way bigger migration lift then a dynamic connection upgrade. And while we are here, WebTransport showed up in 2023 and is the natural succesor to WebSockets, but built on HTTP/3 instead of HTTP/1.1.

Early nostr development started around 2021 and gives some insight into why its rolled with WebSockets. HTTP/3 was very fresh, not even finalized, so could not be considered as a base protocol. WebSockets on the other hand were tried and true. That just leaves why not HTTP/2? At its heart, HTTP/2 is still request/response optimized whereas WebSockets are a bi-direction stream. Given the nature of the client/relay relationship in nostr, WebSockets were the clear fit.

There are some challenges to WebSockets which are relevant to the nostr use case. Infrastructure like load balancers and proxies was originally developed with stateless HTTP in mind. Maintaining a connection complicates things, but isn’t much of an issue these days with modern proxies developing neat ways to support WebSockets. Probably a bigger issue for the nostr use case is the resources required for persistent connections. If a relay has hundreds or thousands of clients subscribed to it, that is a lot of parallel connections. Same for a client attempt to post to a ton of relays. But I believe there are ways to mitigate these resource requirements by leveraging the simplicity of the nostr topology.

topology

The extreme simplicity of the nostr topology is the key to its power. Clients talk to relays, relays do not talk to each other. If a client wants wider distribution, it is on them to connect to more relays.

One can quickly think of many use cases which are easier to implement if relays could talk to other relays. However, it introduces a boat load of terribly complex failure scenarios we see in the federated systems.

  • Relay consensus.
  • Event forwarding loops.
  • Relay relationships.

There is nothing stopping someone from running a client which simply reads off one relays and posts to another, but it isn’t codified in anyway in the protocol.

Identity

Identity in nostr is just public/private key pair. Clients create messages and sign them with their private key and then post them to relays. What is the goal here? Well, we have messages which are really easy to validate and really hard to censor. One thing which nostr is not attempting to create is consensus like bitcoin. No part of the protocol event dictates that the relay services talk to each other. If a user cares that their message gets out there and isn’t ever censored, they should blast it to a bunch of different relays.

Nostr uses the Schnorr signatures like bitcoin. By convention, the keys are usually tossed around in bech32 formatted strings to help us humans. This convention is defined in NIP-19. The NIP includes the example the hex public key 3bf0c63fcb93463407af97a5e5ee64fa883d107ef9e558472c4eb9aaaefa459d translates to npub180cvv07tjdrrgpa0j7j7tmnyl2yr6yr7l8j4s3evf6u64th6gkwsyjh6w6. Like bitcoin, the cleartext prefix can dictate what information is stored in the string. npub is a user’s public key.

web+nostr // NIP-21

So you want to share your nostr public key somewhere so people can message you? NIP-21 defines a URI scheme for apps to use, pretty straight forward with nostr:. I am seeing web+nostr out in the wild, but not sure why the web+ is being added in this case. It is mentioned in the html spec for URIs, “Effectively namespaces web-based protocols from other, potentially less web-secure, protocols.”.