Bitcoin

// Fix the money, fix the world

Reactor online. Sensors online. Weapons online. All systems nominal.

Consensus

The value proposition of bitcoin is derived entirely from its leader-less consensus, known as Nakamoto Consensus.

The word consensus brings to mind a few things. Well…if I am being honest, these days I just think of bitcoin. But I used to think of other things! Like a jury where members are tasked with finding a verdict. Or computers in distributed systems which rely on quorums to make decisions. Juries and quorums are wildly different scenarios, but they have a commonality, trust is placed in relatively few members to pragmatically create consensus. Juries are small subsets of the population and quorums are small subsets of computers in a system. Generally, the time it takes to find consensus doesn’t scale well with the number of participants, so we trade trust for practicality.

This trust-for-consensus trade works for a lot of scenarios in life. But trust is power. And sadly, humans tend to abuse the shit out of power. So back to bitcoin’s value proposition: consensus, but without leaders. No leaders means no trust to abuse.

Sounds good, but what exactly is bitcoin’s Nakamoto Consensus creating consensus on? The consensus covers a very specific set of virtual coins. A currency of sorts. And that is it! Now whether you think this has any value…well, how much trust has been placed in the Federal Reserve to pragmatically manage the dollar? This is the entrance to the rabbit hole

In any case, Nakamoto Consensus is achieved through a protocol backed by game theory and applied cryptography. Users place trust in this protocol instead of third parties.

proof of work

Bitcoin outputs an ever-growing list of transactions involving virtual coins. Consensus covering this list, and only this list, is created because users trust the protocol instead of middlemen. The protocol is able to pull this off by leveraging some applied cryptography.

Let’s build this magic from the ground up.

A characteristic of cryptographic hash functions is leveraged over and over in Bitcoin: hard to reverse, but easy to verify. Compared to the simple algebra functions you are comfortable with, this is a very weird property. For example, let’s take the super simple function 2 + x = 5. If someone came up and said “Hey, x is 3.”, you could easily verify that by plugging it in 2 + 3 = 5. But with some simple maths, you could have just as easily reversed the function yourself and calculated x = 5 - 2 = 3. Using really mind-bending maths, a hash function keeps the easy to verify part, but makes the reverse part impossibly hard. It is what we call a one-way function. A real world metaphor might be cooking pancakes. It’s easy for someone to try a bite of pancakes and verify “Yea, these are pancakes”, it is harder for them to guess the exact ingredients used to cook those pancakes. And to make this metaphor more like cryptographic reality, we need to imagine a world where pancakes can have infinite ingredients, not the common handful.

Leveraging this hard to reverse, but easy to verify property is how the protocol balances user incentives to generate consensus. The incentives influence each other, so we’ll just start with how a transaction is added to the list and go from there. But as the system comes together, you’ll see how the mathematical knot, one-way functions push and pull users to a single spot. Consensus.

New Bitcoin transactions are added one group, known as a block, at a time to the list of all transactions ever, which is known as the blockchain. Which transactions go into the next block is determined by a Bitcoin user. But which user? Wouldn’t this user have a ton of power since they could dictate what transactions are “official” or who is allowed to send transactions? The user does in fact have this power, but to counter this, the protocol only allows for short-term, per-block, leaders. So why would a user even want this responsibility? Well it ain’t altruistic, there is a financial incentive. The leader collects all the per-transaction fees in the block that they add to the blockchain.

So now the tricky part. How does the protocol determine a leader? And why would other users trust this leader?

Blessing a leader is not a lottery, users must contribute work in order to have the chance to be the leader. The more work contributed, the higher the chance (linear) to be the leader. What is this work? The leader needs to do the hard part of the hard to reverse, but easy to verify hash function property. They have to guess an ingredient in the pancakes. But where the set of pancake ingredients is extremely limited, the “ingredient” set for a bitcoin block is extremely massive. Think atoms-in-the-universe scale. Users aren’t sitting at their computers manually entering in random bytes, that would take forever. Turns out, computers are good at this guessing game, so a user delegates to their computer to do it a few million times a second for them. Computers don’t run on hopes and dreams though, these calculations require electricity. A real world limited resource which costs something.

About every ten minutes (this is fairly predictable due to an algorithm in the protocol, but we won’t get into that here) a user (well, their computer) guesses the right input and gets to add their new block. They are the leader for that block. All other users are able to do the second half of hard to reverse, but easy to verify and easily verify that work must have been done to create the block (…proof of work).

There is little incentive to broadcast bogus blocks since they can easily be detected and discarded. Proof of work allows bitcoin to be censorship resistant since there are no long term leaders with power to abuse. If Mallory really hates Bob and is thinking “If I ever become the leader, I am not going to include Bob’s transactions in the block, I hate that guy.”, Bob can just take his transactions to any other user attempting to be the leader (called a miner in bitcoin lingo). Even if Mallory becomes the leader for a block, that doesn’t influence any of the following blocks, she will have to do the work all over again and continue to be the leader in order to censor Bob.

There is a high risk to ignore valid blocks. We have to sink into the blockchain structure to see why the risk. The blockchain is the list of all previous blocks and the blocks are chained together with the same hard to reverse, but easy to verify hash function, making it easy to verify the whole blockchain, as in, every transaction ever. If a block is modified somewhere in that chain, all blocks after it would then be invalid. So if a user verifies the blockchain, they know nothing has been changed or tampered with all the way to the beginning of bitcoin time. This layers proof-of-works on top of each other, making it harder and harder to modify a block the deeper it gets in the chain.

Let’s say Mallory paid Bob for some pancakes and the transaction is now three blocks deep in the blockchain. As stated earlier, Mallory hates Bob, so she decides to try and change the transaction sending the funds back to herself. In order to do this, she needs to change the transaction in the block and re-calculate the block’s proof-of-work. But changing the transaction changes all the newer blocks as well, so she actually has to calculate proof-of-works for the next three blocks! Plus, the rest of bitcoin users are actively adding new valid blocks to blockchain, so Mallory has to do all of this faster than everyone else combined. Turns out this costs way more than the pancakes and there is almost no incentive to even try.

So what differentiates the bitcoin blockchain from another one which follows the same rules? The one with the most work poured into it (not necessarily the longest) is the blockchain (assuming the blocks are valid according to the network…some blocksize-wars history their I am side stepping for now). Using a blockchain with less work would just be risky for any user (“Hey man, I totally have 1,000 BTC, it’s just over here on this other blockchain which only three people use…”).

So to summarize, there is incentive to participate in mining and a low barrier of entry. There is no incentive to produce bogus blocks and high risks to ignore valid blocks…hey, distributed consensus! And kinda beautifully, the protocol can be boiled down to energy in => consensus out.

51% attack

A 51% attack is probably the most well known theoretical attack on bitcoin. Luckily, (as in that Satoshi person was pretty smart) the work in Proof of Work also helps protect against this kind of attack.

A 51% attack is when someone controls at least 51% of the work being poured into bitcoin. At this point, chances are they will guess the next block and thus be the short term leader. And as long as they hold at least 51% of the work, chances are that will happen again and again. With this power they could:

  1. Control what transactions get into the blockchain.
  2. Mine a “shadow” blockchain and perform a double-spend by swapping it with the public blockchain. It should be noted that this would be a public action though, so the rest of the network would be aware it’s happening.

This would suck, but it’s not actually that powerful. If an attacker could use the 51% to re-write transactions deep in the blockchain, somehow without it being obvious to everyone else, then it would be powerful. But that isn’t possible due to the proof-of-work chain. And as of 2021, the cost to control 51% of the bitcoin hashing power for just an hour is in the billions of USD. The cost-to-benefit of this attack hasn’t made sense since the early days of bitcoin (like 2012-ish).

If there ever is a successful 51% attack, it probably means the end of bitcoin, so maybe an attacker who wants to destroy bitcoin will try with this goal in mind. But if an attacker has enough power to even consider this attack, which today is limited pretty much to a handful of states, there are probably cheaper ways to try and destroy bitcoin (but good luck).

eclipse attack

An eclipse attack is kind of a more focused sybil attack. In a sybil attack, a bad actor floods a distributed network with nodes that they control in order to make it look like the network has strong consensus on something. In reality, it’s just one person’s opinion made to look like a lot of different people. An eclipse attack is when the target of the attack is just one person, not the whole network. An attacker singles out someone’s node and floods it with a lot of nodes all controlled by the attacker. In the context of bitcoin, an attacker might perform an eclipse attack on someone in order to try and trick them into thinking a different blockchain is the real one.

Proof-of-work again helps defend against this type of attack, and the general sybil attacks, because it doesn’t matter how many nodes an attack peers to their target node, the target only needs one other node to relay the real blockchain and they can easily verify that it is it. For an attacker to really gain anything, they will still have to produce valid blocks (work) so the attack is expensive and at the same time so cheap to beat. It’s probably not worth it.

blocksize

There is an “artificial” blocksize which limits the amount of transaction per second on the bitcoin blockchain. Miner’s are generally incentivized to choose transactions which pay higher fees. The scarce resource in this market is bytes on the blockchain.

This leads to a few questions.

  • Is there a “real” limit to the blocksize?
  • Why have the artificial limit?
  • Is this related to consensus?

The blocksize debate back in 2015 is bitcoin’s most famous holy war (so far!). As with all holy wars, there was a lot of noise coming from parties with different interests. I personally believe most of the noise was generated by parties which valued bitcoin succeeding quickly over bitcoin succeeding at all (e.g. my company only survives if bitcoin is mass adopted in 2 years or else my company fails, in which case I don’t care about bitcoin cause my company just failed).

Bitcoin’s only value proposition is its consensus without leaders. If that is degraded, you might as well use a simple, centralized database at that point. So any changes to the bitcoin protocol should examine how they affect the value of the consensus. The blocksize limit determines what quality of computer is necessary to run a node: lower limit means lower quality. The lower the limit, the more distributed the network, the higher the value of the consensus. I don’t know what the “real” limit is, but increasing the limit even a little bit starts to eat away at bitcoin’s only value proposition. I don’t think it is worth it. Instead, second layer applications (e.g. Lightning Network) are responsible for increasing transactions-per-second for different scenarios and developing more value on top of the layer one consensus.

energy

The energy use of bitcoin causes a lot of fear. Bitcoin is essentially energy in => consensus out and the left side of that equation is objectively measurable, and the right side is completely subjective. This is also true for any household appliance, but bitcoin’s energy use is a few orders of magnitude higher than a single appliance.

The fact that bitcoin requires energy isn’t a fault, it’s a feature. bitcoin’s value is its consensus. If hypothetically, it required some super rare resource to mine bitcoin, that resource is most likely controlled by just a few players. Once there are only a few users to be leaders, bitcoin is less distributed, less robust, lowering the value of the consensus. Mining bitcoin only requires electricity and an internet connection. These low barriers to entry keep the consensus as distributed as possible, raising its value.

Another hypothetical, what if the resource required to mine bitcoin was virtual? What if the resource was bitcoin itself? This specific case goes by a whole new name: proof of stake. It is a popular idea because the consensus mechanism equation could become a lot less energy in => consensus out. But, like a lot of the consensus design trade-offs, I think the equation actually becomes a lot less energy in => a lot less valuable consensus out. This is difficult to analyze, but I believe the crux of the issue is that proof-of-stake adds a ton of complexity by introducing a weird incentive loop. Where bitcoin’s proof-of-work is anchored in the physical world (electricity), proof-of-stake is anchored in the virtual world. And that virtual world is the same world it is tasked with securing.

Digging into that a bit, it is easy to model proof-of-work miner incentives. Contribute energy, get bitcoin. If you stop contributing, you stop getting bitcoin. That’s it. A proof-of-stake miner “stakes”, or ties up bitcoin, so it can’t be used for anything else. They now earn more bitcoin. This is where things start to get weird. The miners can now just passively make more bitcoin. Do the rich get richer? Do the rich have the power to tweak the rules so they get even richer? The incentives are getting hard to follow.

I believe in the long run, proof-of-stake has strong centralization incentives which leads to weak, value-less consensus.

scarcity

Bitcoin generates consensus over 21 million virtual coins. More specifically, consensus on how those coins are locked. While there are an infinite number of scripts to lock coins with, the coins are almost always unlocked with a private key which is a 256-bit number. One way to look at this is that there are only 2^256 private keys now and forever. It is finite set. Of those private key possibilities, a subset are not usable due to the mathematics, and most don’t have any bitcoin assigned to them (and chances are, never will). The interesting thing here though is that this is a form of scarcity, but informational scarcity.

Currencies always rely on scarcity in order to be trustworthy and valuable. People trust gold because it is hard to find and dig out of the Earth. There is a low chance that someone can double the amount of gold in the world in a single day and devalue everyone’s holdings. So the physical scarcity gives people confidence it will hold value. But what if that changed? What if we learned how to fabricate gold in a lab? Or mine it on an asteroid? While this might not happen tomorrow, the informational scarcity of bitcoin is a stronger guarantee than the physical scarcity gold.

Transactions

Bitcoin is transactions and the blockchain orders them.

structure

A bitcoin transaction is made up of a set of inputs and a set outputs. Each input maps to a previous transaction’s output. Transaction outputs which have not been spent, as in have not been mapped to an input of another transaction, are called “Unspent Transaction Outputs” or UTXO’s.

{
  "version": 1,
  "locktime": 0,
  "vin": [
    {
      "txid": "7957a35fe64f80d234d76d83a2a8f1a0d8149a41d81de548f0a65a8a999f6f18",
      "vout": 0,
      "scriptSig" : "3045022100884d142d86652a3f47ba4746ec719bbfbd040a570b1deccbb6498c75c4ae24cb02204b9f039ff08df09cbe9f6addac960298cad530a863ea8f53982c09db8f6e3813[ALL] 0484ecc0d46f1918b30928fa0e4ed99f16a0fb4fde0735e7ade8416ab9fe423cc5412336376789d172787ec3457eee41c04f4938de5cc17b4a10fa336a8d752adf",
      "sequence": 4294967295
    }
  ],
  "vout": [
    {
      "value": 0.01500000,
      "scriptPubKey": "OP_DUP OP_HASH160 ab68025513c3dbd2f7b92a94e0581f5d50f654e7 OP_EQUALVERIFY OP_CHECKSIG"
    },
    {
      "value": 0.08450000,
      "scriptPubKey": "OP_DUP OP_HASH160 7f9b1a7fb68d60c536c2fd8aeaa53a8f3cc025a8 OP_EQUALVERIFY OP_CHECKSIG",
    }
  ]
}

A transaction with one input and two outputs.

A UTXO contains two parts:

  1. An amount of bitcoin.
  2. A cryptographic puzzle, scriptPubKey, which must be solved in order to spend the bitcoin.

The scriptPubKey name made sense historically, but now it would probably be better called “locking script” or maybe “witness script” (but more on that later). A fun fact about a bitcoin output, it must be spent in its entirety. This usually leads to a “change” output in a transaction which sends extra bitcoin back to the sender.

Each input contains four parts:

  1. A transaction ID, referencing the transaction that contains the UTXO being spent.
  2. An output index (vout), identifying which UTXO from that transaction is referenced.
  3. A scriptSig, the script which satisfies the puzzle placed on the UTXO.
  4. A sequence number, a weird re-purposed field now used to enforce locktime and replace-by-fee.

The scriptSig is another legacy name, today it’s more like the “unlocking script” or “witness”.

For a transaction to be valid:

  1. All inputs must map to outputs which haven’t already been spent.
  2. All inputs must unlock outputs.
  3. The sum of the inputs must be larger or equal to the sum of the new outputs. Any difference is an implicit transaction fee that is used to pay to get on the blockchain.
  4. The locktime must be passed (be it block time or height).

script

The scriptPubKey and scriptSig combine to form a little program to unlock a UTXO. The program is written in a programming language designed just for bitcoin called Script. Script is intentionally very constrained since these programs need to be verified all the time by all sorts of computers. It is not turing-complete, so no loops or complex control flow, keeping program complexity very predictable (relatively).

To run a transaction script, the scriptSig + scriptPubKey are concatenated in that order (kinda feels backwards, but makes sense given how the program is executed). The program is executed from left => right. All script programs only contain two types of tokens: opcodes and data. Data is simply loaded onto the stack. Opcodes can pull data off the top of stack, do some calculations, and load more data on the stack. A program unlocks a UTXO if it can (1) run to completion and (2) the stack has a single non zero value (0 or an empty array) at the end (this used to be a little more open, but made simpler with the modern CLEANSTACK rule).

An extremely simple (and insecure) program example is a scriptPubKey of 3 OP_ADD 5 OP_EQUAL. The UTXO in this case is locked by a script which requires a scriptSig of 2 to unlock. The resulting transaction script would be 2 3 OP_ADD 5 OP_EQUAL.

The program execution:

  1. 2 is loaded on the stack.
  2. 3 is loaded on top of 2 on the stack.
  3. OP_ADD pops the two data values off, adds them together, and puts 5 on the stack.
  4. 5 is loaded on top of the 5 on the stack.
  5. OP_EQUAL pops the two data values off, compares them, and puts a TRUE on the stack.

Since the program ran to the end and has a TRUE on the stack, the output is “unlocked”. Pretty simple! Only issue with this example is that anyone could unlock the UTXO as long as they understood simple addition. Luckily, bitcoin supports a few more opcodes that make it useful.

Fun fact, a lot of bitcoin documentation is not explicit about “data push” operators which are necessary to push any data on the stack. Maybe docs kinda gloss over it since there are quite a few ways to go about it and is a bit confusing.

data pushes

Op codes 0-75 are used to push data on the stack. OP_1 pushes the next byte on the stack, OP_2 pushes the next two bytes, and so on. This obviously tops out at 75, so opcodes 76-78 are special. 76 means the next byte contains the number of bytes to push. 77 is the next two bytes, and 78 is the next four.

In bitcoin docs, you often see data pushes in script simplified to just carrot brackets, <data>, instead of explicitly stating the necessary data push opcode.

OP_0, or OP_FALSE, is not a no-op, it pushes an empty array to the stack. You see this used to cancel out the multisig bug requiring some data to be popped off the stack. OP_1, a.k.a. OP_TRUE is defined as opcode 81 (!) since 1 is already taken for the data push. There is a specific noop operator though, 97 is OP_NOP.

timelocks

A timelock is a restriction on when a transaction, or part of a transaction, is considered valid. There are a few different ways to do this with bitcoin transactions. But first, why would someone want to do this? It might be a little counterintuitive, if you are sending a transaction to pay for something, why would you want to restrict when the transaction is valid? Don’t you need it to be valid immediately? For a payment that is probably true, but more complicated contracts leverage restrictions like timelocks. For example, Alice could sign a transaction and give it to Bob as collateral “Hey Bob, here is a transaction paying you, but it isn’t valid for 30 days. I might pay you sooner which would void this transaction (can’t double spend and output).” The locktime can be set in block height or a unix timestamp, in either case though it is an absolute time.

Transactions have the locktime field to control when a whole transaction is valid. This is enforced by consensus rules, so miners won’t attempt to include them in a block beforehand. I see “timelock” vs. “locktime” terminology mixed and matched, but I am using “timelock” as a more general term for all methods to restrict a transaction based on time. A shortcoming of the locktime timelock is that there is nothing stopping users from spending a UTXO before the locktime. In our Alice and Bob example, Bob has a transaction he can broadcast in 30 days and collect the funds, but Alice could broadcast a different transaction beforehand which doesn’t have a timelock and spends the same outputs, leaving Bob with a worthless transaction. Some script opcodes were added to address this. BIP65 added OP_CHECKLOCKTIMEVERIFY which inspects the spending transaction and enforces it has a locktime set after the specified time.

<Bob's pubkey> OP_CHECKSIGVERIFY <now + 30 days> OP_CHECKLOCKTIMEVERIFY

Standard pubkey check extended with an absolute timelock.

With CLTV, Alice can now broadcast a transaction which contains an output that requires Bob signature, but also a timelock of 30 days. Now Bob knows the funds are his for sure in 30 days. It is another set of tradeoffs, but a useful primitive to build with.

Absolute timelocks are useful for restricting exactly when a transaction or output is valid. But sometimes you need something a little more relative. Relative timelocks were added with BIP68. Instead of being an absolute time, these timelocks depend on the confirmation time of the transaction being spent. “The transaction must be buried by 10 blocks before the UTXO is spendable”. So instead of having to guess the absolute time when creating the transaction, the clock starts ticking when the transaction is confirmed.

The interface of relative timelocks mirrors the absolute locktime ones, with a transaction field and an opcode. But there is a bit more complexity since they were added in backwards compatible fashion and kinda shoe-horned in. Relative timelocks use the sequence field on an input. So in bitcoin lingo, locktime == absolute and sequence == relative. The relative opcode mirrors the name: OP_CHECKSEQUENCEVERIFY. The opcode works the same basic way where it enforces that an input has a sequence which forces the UTXO to be confirmed with a certain amount of blocks or time.

Both timelock opcodes, OP_CHECKLOCKTIMEVERIFY and OP_CHECKSEQUENCEVERIFY, force a transaction to have and “at least” timelock on it. Kinda interesting that it is explicit on both sides, I am not sure the benefit of that yet other than making sure both parties are on the same page, I’ll think through it. I also initially wondered why the absolute timelock is at the top transaction level, whereas the relative timelock is per input. But makes sense if you walk through it: the relative timelocks are relative to the UTXOs they are spending, and those could be different per each input. The absolute timelocks are absolute so if any of them have not “passed” yet the transaction is not valid. So might as well just put the one furthest out in the future at the transaction level. Maybe there would be a bit more flexibility for transactions which are built by multiple parties if it was also on the input level, but apparently there hasn’t been much demand for that.

Timelocks are heavily used in the “layer 2” Lightning Network.

OP_RETURN

OP_RETURN is a weird op code which fails the Script program immediately with a message. It’s a way to embed some data into the blockchain, but it doesn’t burden the UTXO memory pool set since nodes can recognize that any UTXO script with OP_RETURN is un-spendable and not worth keeping in memory.

P2PKH

A bitcoin script can define many different and clever ways to lock an output. But some patterns are so useful, a lot of the bitcoin ecosystem is tailored around them. One of the popular OG (but not the oldest) patterns is Pay to Public Key Hash.

Pay to Public Key Hash, P2PKH, is a pattern to send bitcoin to a single person. In this case, a person holding the private key of a public private key pair. If the user wants to spend the bitcoin in this UTXO (for example, send it to another public key hash), they need to use their associated private key to provide a signature to solve this scriptPubKey. The check signature operator, OP_CHECKSIG, is essential for this script.

OP_DUP OP_HASH160 <PubkeyHash> OP_EQUALVERIFY OP_CHECKSIG

Public key hash unlock script.

The OP_DUP OP_HASH160 <PubkeyHash> OP_EQUALVERIFY makes sure that the OP_CHECKSIG can only be checked with the intended user’s public key. Without this, a user could provide any public key and an associated signature.

OP_CHECKSIG which returns true if a signature signs the correct parts of a transaction and matches a provided public key. The real interesting part there is what part of the transaction is being hash’d for this signature? Transactions signal which parts of the transaction a signature covers through the SIGHASH flag. It is safe to assume that the owner of the public key is cool with whatever parts of the transaction are covered by the signature and that they haven’t been tampered with by a bad party.

So the P2PKH script ensures that only the owner of the public key can use the bitcoin.

A public key is a form of identification and is heavily used in the bitcoin ecosystem, so what’s a “public key hash” and why is it better than just using a plain old pubic key?

Blockspace is limited and costs money so anyway to save a few bytes is useful. Hashing the public key is a good security-to-performance trade-off since it really doesn’t hurt the security aspect at all, but shaves off quite a few bytes. bitcoin is a bit quirky and takes the pubkey, sends it through SHA256 and then through RIPEMD160 to create the hash. We would have to ask Satoshi why both, but we end up with a value that takes up less space.

addresses

Bitcoin addresses are part of the tooling built on top of popular script patterns to make them easier to use. Bitcoin address are short strings which encode virtual spots on the blockchain to send bitcoin. Bitcoin wallet software knows how to decode these addresses and construct the script the address describes.

For P2PKH, the address always starts with a 1 (different numbers for different patters) so it’s easy for a human to quickly know what they are dealing with. These addresses also encode information so that computers can verify we humans haven’t fat-finger’d a number and just sent some bitcoin to /dev/null.

Base58Check was the original bitcoin address encoding scheme, but since the SegWit softfork, bech32 is the new hotness.

The term “address” may not be the best from a user use-ability perspective. It gives the impression that an address should be used consistently, like a bank account number. But addresses should be thought of more like invoices. Address re-use degrades privacy and security. Privacy is degraded because the blockchain is public info, users can analyze it and start to triangulate an identity. Address re-use makes their job way easier to not only identity the re-user, but also those who send to the re-user. Security is degraded with address re-use because it opens the user up to niche attacks where a private key can be stolen. These attacks are pretty difficult to pull off in reality, an attacker would need to derive the private key from the multiple public signatures, but there is no reason to risk it.

fees

The blockchain blocksize limits the number of transactions that can fit in a block. How many transactions? That depends on the size (bytes) it takes to describe a transaction. One can imagine a transaction which uses a bunch of UTXOs to pay a big sum to someone. This transaction requires a lot of unlocking scripts, one for each UTXO. This is going to take more bytes on the blockchain than a transaction which only uses a single, simple UTXO. If a miner has to choose between one big transaction or a few small ones, all other things equal, well then it will grab the small ones and make more on fees. So the big one needs to put up a larger fee. The sats/bytes ratio is used to see what it would take to get a transaction into the blockchain given the current market.

What if you create a transaction and the fee is too small? Wait it out? That is one option, but two tools are available to bump the fee of a transaction.

The first is called “replace by fee” (RBF). This is a policy where node operators will replace an existing transaction in the mempool with a new one which spends at least one of the same UTXO, but pays a higher fee. An important point here is that this is just a policy, not part of the bitcoin protocol. Miners are free to include any transaction in a block that they want, including an old one that a user attempted to bump a fee on (although miners don’t have much incentive to grab a tx with less fees). But because this is just a node policy, there can be many wrinkles to excatly how a node and its operator support RBF. BIP125 introduced an RBF policy (nowadays called “opt-in” RBF) which leverages the weird sequence number field on a transaction. If a transaction uses a sequence number less than ffffffff than it is signaling that it can be replaced (opting in) by a transaction with a higher sequence number. That new transaction has to have a higher fee than the original (if following this policy). An alternative policy gaining transaction these days is called “full RBF” which allows any transaction to be replaced, no opt-in necessary. Something to note about any RBF policy is that only the original transaction creator(s), with the secrets to unlock the UTXOs, can create a new transaction to bump the fee.

A second tool to bump a transaction fee is a pattern called “child pays for parent” (CPFP) also known as “ancestor feerate mining”. This is where a second transaction is created which pays an output of the first transaction. A relatively large fee is placed on this second transaction in order to incentivize miners to include both the original stuck transaction and the new one in a block. One nice aspect about CPFP is that usually wither the sender or receiver can decide to create a CPFP transaction. The sender would attach it to the original transaction’s change output, which is to them, where the receiver would attach it to the UTXO headed to them. Technically any number of transactions could be chained together (I guess the upper limit being the number of transactions in a block? Gotta think on that.) and as long as the last one has a large enough fee a miner could choose to include them all in a block. However, most nodes have policy to only re-broadcast transactions with less than 25 parents.

Both fee bumping mechanisms are exposed to “pinning” attacks where someone could attach a new transaction to the original which makes it either prohibitively expensive to pay for all the fees or in some way breaks node policies so the new transactions won’t be re-broadcast’d.

dust

Dust is a UTXO which holds so little value, it is more expensive to pay for a transaction to get on the blockchain than they output’s value. This is a moving target given the market for blockchain space is always changing.

Even in the best case scenario for a transaction trying to spend “dust” UTXOs, where there are tons of cheap (size-wize) inputs and only one output who’s cost is amortized, there is a physical minimum size of a UTXO input unlock script. In this best case scenario, this input is for a taproot UTXO and would be around ~60 vbytes. The cost of this input alone is determined by the current blockspace fee, let’s say its 15 sats/vbytes. That means the UTXO needs to have a value greater than 60 * 15 = 900 sats to be economically viable.

It is easy to see the incentive of doing some house cleaning when fees are low, combining dust UTXOs into one so that the produced UTXO is still economically viable when the mempool is full later on.

mempool

The mempool (Memory Pool) is the backlog of transactions waiting to be confirmed on the blockchain. It is often described as “The mempool”, but there isn’t actually a single mempool somewhere. Bitcoin consensus only covers the transactions in the blockchain, not those waiting to get in. Every bitcoin node, including miner nodes, has its own mempool. As nodes gossip new transactions that want to get onchain they add them to their mempool. So even though there isn’t “a” mempool, everyone’s pools are probably pretty similar at any given moment.

While the mempool isn’t covered by consensus, there are strong “policy” conventions established by bitcoin Core. These policies attempt to create the premier market to find the price of transactions getting into the blockchain with the goal to keep things as distributed as possible. Although it is always possible for users to go direct to miners with a transaction bypassing these policy rules. But users and miners both have high, aligned incentives for this distributed market. Miners want to be able to easily grab to most valuable transactions to put in a block, a.k.a. the ones that give them the most fees. Users want a very clear picture of the current fees in order to pay the lowest fees possible for their transactions.

grinding

The term grinding gets tossed around a bit in and funny enough it is kinda similar to mining, but maybe we just wanted a new name. When we hash something, by definition we have no idea where it is going to end up in the possible output set. But there are times we want that output to have certain characteristics, so we “grind” inputs until the output has them. When blocks are mined, they are looking for long prefixes of zeroes. And those are controllably very rare. But there are other scenarios where the characteristic is a 50/50 shot. An example is ECDSA R value size.

Witness Improvements

Bitcoin is a technical marvel and might just save us all, but it wasn’t born perfect. Bitcoin Improvement Proposals (BIPs) are technical proposals to improve the system. BIPs come in all shapes and sizes addressing any part of the bitcoin system. Many of the them have been merged into the protocol over the years.

Of all the BIPs merged so far, I don’t think it is too crazy of a take to say that some have had a disproportional impact on the protocol than others. A particular set of BIPs I am vaguely calling the “Witness Improvements” have probably had the largest impact. These BIPs were merged in three big steps: P2SH, SegWit, and Taproot.

Each step, comprised of one or more BIPs, brought some new functionality to the protocol. But all the “Witness Improvements” share some common goals.

  1. Make bitcoin more efficient, enabling decentralization.
  2. Make bitcoin more private, making it more useful.
  3. Remain backwards compatible so it remains useful.

The technical aspect that ties these changes together is that they focus on the data which historically (spoiler!) lived in the scriptSig part of the bitcoin transaction.

P2SH // BIP16

Pay to Script Hash, P2SH, scripts were standardized in BIP16. They allow transactions to be sent to a script hash (address starting with 3). Why is this useful and such a game changer?

Script is pretty powerful, but it puts the burden on the sender to come up with the appropriate locking script. For P2PKH, this is really easy because there is a standard address to send to a person. But what if a user wants to create a shared UTXO where anyone of three people could spend it? They need to get this multi-sig script to the sender and hope they copy/paste it right. P2SH allows the receiver to define the locking script and then just send the hash to the sender instead. This “receiver defines the invoice” pattern is closer to how transactions actually happen in real life, with the receiver usually defining an invoice which the payer agrees to pay.

Verifying a P2SH transaction is a whole different beast. The P2SH locking script pattern is recognized by node software as a “special” form, a template of sorts. So it gets executed slightly differently to “normal” legacy scripts. Two code paths is a heavy cost in complexity, but the benefits are too good here.

OP_HASH160 <20-byte-hash-of-redeem-script> OP_EQUAL

A P2SH output since it matches the specified template.

OP_0 <sig A> <sig B> <redeem-script>

Example input script showing data outside (or to the left of) the redeem script, and sidenote, the OP_O is the multisig bug.

  1. The input and output scripts are concatenated like always and executed to verify that the given redeem script’s hash matches the one committed to in the UTXO. If it checksout, old interpreters which don’t know about P2SH will just see the TRUE on the stack and think all is good (CLEANSTACK doesn’t apply here, so the rest of the data on the stack is ignored).
  2. Now if the interpreter is P2SH aware, it recognizes the template of the UTXO and runs another validation step. The redeem script is decoded (a.k.a. “unpacked” into the running program) and ran on a new stack which doesn’t have that TRUE data on top.

I am not sure of use cases, but technically consensus allows non-data-pushes in a P2SH input script (outside of the redeem script). This is technically required for P2MS scripts like the one above to deal with the OP_0 bug. But standardness rules might not be as forgiving as consensus (scriptsig-not-pushonly).

Random thought, but what happens if a user coins locked up some coins with a script which by chance matches the P2SH template? They are not locking to a script, but just some random preimage. Do nodes still try to then decode the preimage as a script and fail, locking the coins forever? Sounds like the answer to this is simply “yes”, but there were not many scripts matching the template when P2SH was activated.

If we could go back and time and make one change to the original protocol, I think this is it. The invoice model is much easier for users to use and understand. We would also be able to drop tons of complexity in the protocol if the original flows don’t exist at all. But, live an learn.

SegWit // BIP141 + BIP143 + BIP144 + BIP145

The SegWit (“segregated witness”) upgrade was a large bundle of BIPs proposed back around 2015 and activated in 2017.

One goal of SegWit was to fix transaction malleability. transaction malleability allows transactions to be tweaked and leave them almost entirely the same, but with a new ID. An example tweak is to change the signature script with additional instructions that aggregate to nothing (OP_DUP OP_DROP). The new transaction ID means any transactions based on the old ID are now invalid. This makes it really difficult to lock in digital contracts necessary for things like the Lightning Network.

Another big goal was to fix the quadratic sighash issue. In pre-SegWit bitcoin, the cost to create and verify a transaction signatures scales quadratic-ly (O(n^2)). This isn’t great because it eats away at bitcoin’s leaderless consensus value prop by bogging down small nodes, pushing towards centralization. It wasn’t obvious to me why this was quadratic. But the crux is that we have to measure on the input size, the hash computation is dependent on the size (which makes sense, but for some reason I was just thinking constant time there). So pre-SegWit, n inputs have to be signed n times, n x n. SegWit addressed this by coming up with a new signature scheme where the transaction parts are hash’d before being signed for each input. This way the input size is constant for real.

A third improvement was a change in how bitcoin addresses are encoded from Base58Check to Bech32. There are two big benefits to Bech32. First, it uses a character set that doesn’t mix uppercase and lowercase letters, protecting human users from making mistakes. And second, the checksum validation is much safer, even allowing in some scenarios for mistakes to be corrected not just recognized.

There are other improvements bundled in the big SegWit change that I am skipping for now, but one last big requirement was to make all the changes backwards compatible (otherwise known as a “soft” fork). This just means that clients running old code wouldn’t break, but it doesn’t require that they get the new benefits. The fact that this major of a change went in as a softfork is kinda a minor miracle from an engineering perspective.

So how was this pulled off?

The big change was moving the witness data (the input unlock scripts) into a separate data structure in a block. The transaction data that describes its effects (where bitcoin is coming from and going to) remains in the original location. Kinda clever, it sticks the witness data in a coinbase output which uses an OP_RETURN code so that old software ignores it (backwards compatible), but new software knows where to look and verify the scripts. And very important, the witness data is still committed to by the block, it is included in the block header hash. Changing the witness data would require a new proof of work for the block.

Before segwit, it was pretty simple for miners to calculate what the most cost-effective transactions to put in a block. A block had (has? depends…) a max size 1MB. A miner just needs to maximize tx fees / tx bytes. Small transactions (as in script size bytes, not value) with high fees are great! Large transactions with small fees suck! Easy. But how are SegWit transactions measure now that part of the transaction is stored somewhere else? There are a lot of things to weigh here.

One of which is that witness data pre and post SegWit is never in the UTXO set (a.k.a. the mempool). This means it puts less of a burden on the bitcoin system than the other parts of the transaction. Perhaps this data should get a relative “discount” to encourage more usage here? Another factor is that the old pre-SegWit nodes have a 1MB blocksize limit, to change this would be a hardfork (these nodes wouldn’t accept a 1MB+ block). While post-SegWit witness data is still stored in a block, it is stored in a new spot which is not included in the old 1MB calculation.

SegWit introduced a new calculation to figure out the “weight” of a transaction. These wight units (WU) are more abstract than the straight-forward bytes of old, but not too complex. The new policy for blocks is that they can only be 4M WUs, instead of 1M bytes (1MB). But how are the weight units calculated? 1 byte of base date is 4WU while 1 byte of witness data is 1 WU. Two really key points here. First, old pre-SegWit blocks are made up of 100% base data so the blocksize rule remains the same in their eyes (backwards compatible). Second, the witness data is getting a discount. If a transaction can be re-written to move more of its logic into the witness data, than it will be cheaper to get on the blockchain. This incentivizes putting much less burden on the UTXO set.

I am still looking for an ah-ha moment for why the witness data is given a 1:4 discount, versus something like 1:5 or 1:10. We could have rolled with no relation at all, but this would have added another dimension to the miner knapsack challenge for choosing transactions to put in a block. This needs to be avoided at pretty much all costs since it would lead to miner centralization as specialization occurs. But why 1:4? It appears that in practice, 1:4 is a good spot where inputs and output costs are generally equal, which is nice to keep incentives easy to reason about. I am not sure if this was totally on purpose or just a happy accident, but it appears to kinda fix an issue with legacy transactions where outputs were much cheaper to create than inputs. In other words, if outputs are much cheaper to create than inputs, it would push users to spend less UTXOs and create more small change UTXOs. This puts a burden on the UTXO set. On the other hand, we probably wouldn’t want to make inputs too cheap since it would start to eat away at privacy is everyone was consolidating. In any case, this is all still covered by consensus rules and is totally deterministic, so whether it is the perfect ratio or not this is the way.

If the signature data is stored somewhere else, what do SegWit inputs and outputs look like? A transaction paying to a SegWit address has an output which just loads data onto the script stack. The input’s scriptSig (the old spot for the unlock script) is empty (the whole point of all this!). Checkout the backwards compatibility though, if an old client validates this transaction the script will end with data on the stack, the transaction is valid. To old nodes it looks like an “anyone can spend” transaction. Newer clients however recognize this SegWit data on the stack and perform further validations. Transaction malleability isn’t possible with SegWit transactions since the scriptSig is now empty, instead moved to the new witness field. And while the txid doesn’t commit to the new field in order to avoid malleability, the new wtxid does. These two IDs now do the job txid used to have. The txid continues to be used to chain transactions, but is safe to use with un-broadcasted transactions since no malleability concerns. And the wtxid is used by miners to commit the whole transaction to a block, so the witness data is still “covered” and cannot be changed once confirmed.

SegWit introduced two new common transaction addresses to match the most used P2PKH and P2SH: P2WPKH and P2WSH. SegWit enabled wallets know how to decode these addresses and piece together the familiar script templates, but pull the data from the new spots. For interop with old clients, its even possible to embed a SegWit script into the old P2SH.

SegWit outputs for P2WPKH and P2WSH have very specific templates. A version byte plus either a 20 bytes data push or a 32 byte data push respectively. For the initial SegWit softfork, the version byte is just 0 for v0. A SegWit enabled node recognizes these templates and knows to look in the segregated witness structure to validate inputs spending them.

witness stack

The witness data is no longer stored directly in scriptSig field, but over in a witness stack. There is one stack per-input.

The witness validation execution is similar to P2SH. The special validation is enabled by the output matching a template, but for segwit it is a version byte and then a 32-byte hash of the witness script (name for the redeem script in this context). The witness stack looks like a P2SH input, with data elements on the left and a data push of the witness script at the end. The witness script is hashed and checked against the UTXO’s hash, but this is now implicit (vs. the explicit P2SH output template). The script is then deserialized and ran against the data provided in the stack.

Taproot // BIP340 + BIP341 + BIP342

Taproot was the third big softfork bundle of improvements activated in late 2021. It extends upon SegWit which extends upon P2SH.

A quick summary of these big leaps in the bitcoin protocol:

  1. P2SH moved the locking script from the output to the input.
  2. Segwit moved the script out of the transaction.
  3. Taproot took advantage of both to ease script restrictions.

I think it’s easiest to start with what P2SH addressed. Before P2SH, payers were required to describe locking scripts in the transaction outputs. This makes sense from the payer-defines-the-transaction perspective, but generally, its the payee who knows how they want to lock up funds. Having the locking scripts in the outputs also means they end up in the expensive in-memory UTXO pool of every node. Pay To Script Hash flips this and puts the locking script in the input that unlocks the output. The output now just contains a hash of the script. For the output to be unlocked, a user must provide a script that hashes to the output has and the script must return true.

SegWit improved the performance of the system by moving the unlock scripts out of the transactions, allowing them to be pruned in most parts of the system.

So P2SH and SegWit were game-changers, but some weaknesses remained. First, the entire unlock script needs to be posted to the blockchain in order to unlock an output, even if only one of many sub-branches are relevant. Imagine a script with many if statements and only one of them actually unlocks the output, but the rest are still sitting there on the blockchain. This has a negative performance and privacy impact. For performance, there are a lot of wasted bytes carrying around those extra unused logic paths. Nodes have to enforce some “max size” settings to protect against a large script bogging down the network, and this limits scripts even if the code path that ends up getting used is relatively small. For privacy, the extra paths disclose unnecessary information. For example, on a lightning channel close the transaction posted to the blockchain gives away enough information that it’s a safe bet both parties are operating lightning nodes.

Taproot fixes the remaining performance and privacy issues of P2SH and sets the stage for future upgrades. The two keys to the upgrade were Schnorr Signatures (BIP340) and Merklized Alternative Script Trees (MAST) (BIP341) which were then codified in script with BIP342. And with these performance benefits in place, taproot scripts have more relaxed rules, but let’s dig into it.

Schnorr signatures are pretty much better in every way than bitcoin’s historically used ECDSA signatures. This makes sense because ECDSA was developed just to get around patent issues with Schnorr, so it’s essentially water’d down. The Schnorr patents have since expired so are now free to use! This opens the door for interesting signature use case which are now way more performant. The term “scriptless scripts” has popped up to describe these new flows since they generally take Script patterns and bake them into the signatures themselves.

First up is simple key aggregation (linearity). Public keys can easily be sum’d together to form a new public key which can be used in n-of-n multi-signatures scenarios, sometimes called scriptless multi-signatures. Historically, multi-signature scripts required signatures from all n users. Now these would like simple one signature scripts. This helps both from a performance and privacy perspective since only one sig end up on-chain. There is coordination complexity though to create that one signature from multiple parties. Codifying a safe and performant way to do that dance is being done in the MuSig work. Other scriptless script patterns include threshold signatures where k-of-n users can create a signature (checkout FROST) and adaptive signatures where a secret can be baked into a signature (these bring PTLCs to the Lightning Network).

The other big change in the softfork was Merklized Alternative Script Trees (MAST’s). These are new (for scripts) data structure that allows a script to only reveal the path which is used to unlock an output. A script which historically would have some if statements can instead be modeled in a tree, with each leaf node being a possible unlock. So a classic lightning payment channel HTLC output which has a timelock clause to pull back funds on a failed transfer would only expose this clause if it has to be used. This makes it much less obvious that the transaction is a part of the lightning network. And even better, the unlock script only has to publish the “leaf” it’s using to unlock, freeing up all the wasted bytes which used to be used to describe the rest of the possible paths.

With all these performance improvements in place, some “standardness” policy rules were relaxed on the p2p network layer.

outputs, inputs, and script

So how does this look in practice. In the post-P2SH world, there have always been two common patterns: pay-to-public-key and pay-to-script. Taproot merges these back together since every output is now a tree instead. If we just look at the SegWit v0 (post-SegWit softfork) templates, P2WPKH and P2WSH, those outputs are always 22 bytes and 34 bytes respectively. It is obviously very easy to see these onchain. Taproot still supports these two uses cases, but every Taproot output is just 34 bytes (btw, the transaction size calculator is a nice tool to get an idea of things). It is no longer easy to see what an output is until it is spent, and even then, it might still be impossible to guess if it is a scriptless-signature.

Right off the bat I was wondering if an output always being 34 bytes, more than the 22 bytes of the super efficient P2WPKH, would cause users to not migrate to Taproot. It turns out though that this is only about 1.5% more efficient across the whole transaction and only in the best case scenarios which isn’t often. So there is no reason to not always use P2TR.

A P2TR output is a SegWit v1 output. This is the first change to leverage the versioning system SegWit introduced. I have to imagine this made the change a lot easier to rollout since SegWit-aware nodes would see the 1 at the start of the output and know “Hm, it is a SegWit transaction, but I only understand v0, so just going to auto-approve”. Compare this to the big shifts which P2SH and SegWit itself had to make for node validation logic. After this fork in the road though, all the supported validation can be different…and with taproot it is pretty different.

OP_PUSHNUM_1
OP_PUSHBYTES_32 9515a27af408da5b84ffbfdf376fe3ef459c5dea7d56e45f6054a6304ebb47b7  

Example P2TR output.

The Taproot v1 output contains just a 32 byte Schnorr public key, the taproot output key. Anything with more bytes would just be considered a “I don’t know what this is, stamped”. I was curious why this was a full public key instead of a hash’d one, which previous output templates have made use of to save on blockchain space, and BIP341 mentions in point #2 of the rationale that the size trade-off wasn’t worth some of the collision implications. So all Taproot outputs look like “pay to pubkey”.

Since the Taproot scripts are using Schnorr, a single pubkey possibly covers a few common use cases, like pay to person and pay to multisig. What is neat is that even after these outputs are spent, there is still no indication onchain about which use case was used.

The Taproot script validation has a tad more going on though. The first big logical fork depends on how many items are in the witness stack. If there is just one, it is assumed that this is a signature for the output pubkey. This is called a keypath spend in Taproot lingo since no further script is involved. These will probably make up the bulk of P2TR outputs since they cover the common use cases and are the cheapest onchain. But a more complex output requires a scriptpath spend.

Scriptpath spends are where the MAST structure is placed so only the script being spent is revealed onchain. The Taproot validation assumes it is a scriptpath spend if there is more than one item on the witness stack.

Even in the simplest keypath spends, it is recommended that wallets still tweak a key once so that there is no possible script path. The taptweak is the commitment hash used to tweak the initial internal key. Although there is nothing consensus-wise to enforce this since no one can see if you did or didn’t. It kinda simplifies things a bit though with what is being hashed and tweaked. A keypath taptweak is just the public key, where a scriptpath taptweak is the public key concat’d with the merkle tree root.

// Keypath key from a taptweak of just the internal key P.
Q = P + H(P)G  

// Scriptpath key from a taptweak of the internal key P and the root of the merkle tree.
Q = P + H(P||Merkle_Root)G

Best practice to always tweak a key to remove undefined use cases.

The script tree is a merkle tree, and the leaves are the alternative scripts. This isn’t an abstract syntax tree where there are conditional “branches” (e.g. if this go here else go here). Instead, each leaf is a script “alternative”, independent, script. If there is just one leaf script than it is the root.

The witness of a scriptpath spend needs to contain the script itself and any data to satisfy the script, just like P2SH outputs of old. But then it also needs a few more bits of data to confirm that the script was committed to in the merkle tree. The witness needs the initial internal public key (pre any tweaks) and the merkle path from the script’s leaf node to the root. The merkle path are the necessary branch hashes to hash with the script all the way up to the root.

Initially this sounds like a like of extra byte overhead for a simple P2SH. And it might actually end up with something like an extra 32 bytes in the “single script” case. But since the overhead grows logarithmically, it only takes a few alternative scripts for this to pay off fast.

public keys and parity bits

Taproot switches to the Schnorr signature scheme, but it is using the same secp256k1 curve for public/private keys. But the public keys are encoded differently. Public keys are points on a curve (vs. private 256-bit/32-byte scalars) so they are made up of a 32-byte x and a 32-byte y value. A “compressed” encoding of a public key is just the x value and a single byte used to flag if the y value is even or odd, so 33 bytes total. In taproot, the public key is just 32 bytes and y is assumed to be even. This saves a byte, but introduces some extra overhead when validating tweaked keys on script spends. A parity bit needs to be passed along to flag odd keys.

merkle paths

Ok so you have a witness script you want to add to a UTXO, how exactly to you hash that up and embed it in the internal key with a tweak?

At the bottom of the MAST the leafs are hash’d with hash(leaf_version || compact_size(s) || s).

Wallets

A bitcoin wallet is a special type of client application. The name “wallet” is kinda misleading, since a new user might assume the applications is “holding” coins like a real-world wallet holds bills. But coins exist on the public blockchain. Wallet apps hold the secret materials necessary to move coins on the blockchain. This is commonly private keys used to sign transactions, but there could be other details needed for special scripts.

coins vs. accounts

The bitcoin blockchain only keeps track of how coins are locked up. When you hear terms for applications like “wallets” you might assume there is more of an account system under the hood. But that isn’t the case. For a bitcoin wallet app to show a user’s “account balance” in bitcoin, the wallet client software needs to know what outputs (usually derived from public keys) the user owns. It can then scan the blockchain and sum it up for them, but there is no simple “account” primitive on the chain.

Some other blockchains, like Ethereum, actually use an account system under the hood. Transactions are not “unlock, lock”, but instead are “spend from balance, add to balance”. It is a subtle difference, but has a butterfly effect on all the systems built on top. This results in all sorts of technical trade-offs, but the one I am most interested in relates to user privacy. The account system has technical incentives for users to re-use their account instead of creating a new one per-transaction. The coin output model has no such incentive. This incentive lowers the privacy of the account system since it makes it easier to connect users. These little incentives tend to snowball in the complex systems, which in this case, severely effects user privacy.

hierarchical and deterministic // BIP32 + BIP39

While there are infinite ways to lock bitcoin, most of the time it comes down to “the person with this secret private key can unlock these bitcoin”. And the private key is just some 256 bits. Humans are not very good at memorizing 256 0’s and 1’s though. And while you can use one private key your whole life, it is terrible for security and privacy. Ideally, you use a new private key per transaction. But wow, that is a whole ton of 0’s and 1’s! Get just one wrong you are out of luck. The good news though is some patterns have emerged to help us out.

So how should someone go about choosing their secret 256 bits? And once they have them, how do they make it easy to remember them? You could choose 256 0’s in a row, that would be really easy to remember. But it would also be easy for someone else to guess. So we want the bits to be totally random, but memorable…kinda at odds with each other. This is where BIP39 comes in.

BIP39 defines a process where some number of random bits are instead represented by just a handful of words. All a user has to do then is remember the words, which humans are generally pretty good at. There a few knobs to turn in the algorithm, but the popular version represents 128 bits as 12 words. But how? The key is defining some mapping of 2048 words to 11 bits. 2^11 = 2048 so each permutation of 11 bits is assigned a word. If you choose 12 words at random from the list, you now have 11 x 12 = 132 bits of entropy (randomness). And don’t worry, the list of 2048 words is already defined in BIP39 so we ain’t changing that any time soon. Even cooler, the words were selected in a such a way where you know which word it is with just the first four letters.

BIP39 includes a more clever checksum approach to choosing words though. The computer generates 128 bits of entropy and then takes the SHA-256 hash of it. The first 4 bits of the hash are the checksum and that is concatenated with the original 128 bits. You now have 132 bits, exactly 12 words, but with the added benefit that the checksum is a quick way to check the other 128 bits. Maybe the user wrote down a wrong word or something.

But why 128 bits? Why not represent a full 256 bit private key? Or even more, say 512 bits? Well, nothing is really stopping you, but then you have to remember your special scheme. BIP39 was designed to compose with BIP32. We will get into the details of BIP32 later, but the high level goal is to take 128 bits and expand it to 512 bits. Off the cuff, this sounds a little insecure. We are mapping a 128 bit space (2^128 possibilities) to a 512 bit space (2^512 possibilities)? That means only, at best, only a tiny fraction of the 512 bit space can be generated! But the power of cryptography is here to save us.

Cryptographic hash functions are used to map the 128 bits to 512 bits in a process known as key expansion. By definition, the functions have uniform distribution (don’t skew to one end of the field), are unpredictable (no idea where an input lands in the field), and are one-way (if you start at a point in the field, no idea what input gets there). This means it is infeasible to identify which numbers is the 512 bit field actually map to the 128 bit field. If an attacker knows I used 128 bits to make my 512 bits, it gives them no extra information to find my 512 bits. They will have to just brute force the 128 bit space, which is still computationally infeasible.

So to answer the earlier question, 128 bits are used because it is still secure and easier to memorize 12 words instead of 48. It is a trade-off.

BIP39 adds one more layer of security which I think brings a lot of UX complexity. In the key expansion process, right before the 128 bits are sent through the hash function ringer to be expanded, a salt is added. The salt is the static string mnemonic concatenated with user supplied passphrase, sometimes referred to as the 25th word (which is confusing cause it is totally different than the other words, not even required). Essentially just tossing in some randomness so the output lands in a completely different part of the field. This bring a few subtle security upgrades, at the cost of the user needing to remember another word. And while there is no specified maximum length, there are performance costs to making it really long. Not to mention harder to remember.

Salts do help protect against any pre-computation done by attackers (rainbow tables), but there are a few steps in key expansion which already protect against that including the large input space (2^128) and sheer number of calculations per round. The static salt, mnemonic, obviously is known to everyone so doesn’t help for rainbow tables, but it does give a nice namespace reducing any chance of collisions.

Wallet apps have used the passphrase as a way to create 2 wallets for a user, one with the passphrase and one with the same words, but a blank passphrase. This opens the door to some plausible deniability for the user. But the different uses of the passphrase across wallet apps has lead to fragmentation and a confusing user experience. Hopefully that gets iron’d out someday.

So with all that said, what does BIP32 actually define? It brings the hierarchal determinism.

Let’s say a user wants to control their keys from multiple devices, and they are following best practices to create a new key per transaction. So they have a bunch of private keys lying around. One way to do it is copy/paste all the private keys and keep them in-sync between devices as they are generated. Feels just a tad error-prone though. BIP32 describes an algorithm wallet software can use to generate (or re-generate) keys from a single secret seed. So given just one secret, you have infinite keys (or at least as many as you need in a lifetime). BIP32 also defines a way to describe paths to keys from the seed, so something like “the third key down on the right”. These allow users to give some hierarchy, like a tree with leaves, to their keys if they so choose (e.g. these keys are for business X and these over here are for business Y). As one can imagine, it is very important for a user to keep the seed secret since it is the key to all the keys.

It is pretty easy to see how you can take BIP39 to define a secret seed, plug that into BIP32, and bam, you now have an easy to memorize secret which gives you infinite addresses.

The 512 bits gives BIP32 a little more wiggle room to start spitting out keys with some cool properties. A naive key generation algorithm is just taking the hash of a 256 bit private key to get a new private key. This can go on forever, but if just one is leaked, the following keys in the chain are all compromised. BIP32 uses its extra bits to put in some protection against this and more.

These are called extended keys which come in public xpub (extended public key) and private xprv (extended private key) form. These are just like normal public and private keys, but include extra information for generating keys in their tree. One cool feature of BIP32 is that it’s possible to generate just the public key side of trees using some elliptic curve maths. This can be useful to generate receive addresses or have a watch-only wallet, but these keys should still be protected for privacy.

A layer of security BIP32 brings is hardened keys which are kinda like a firebreak. Non-hardened keys can be compromised if an attacker gets the xpub plus a child private key. This is a enough information to calculate the parent private key with some algebra. The attacker can then derive all child keys, essentially hacking a whole tree. If instead a hardened key is used, even if a child private key is compromised, the parent private key cannot be calculated. So the rest of the tree is safe.

So you could just flip a coin 256 times and make a single private key, but why miss out on all this fun?

output descriptors

Output descriptors are a standard way to describe outputs and either the public or private materials needed to create them. This was kinda tried before with the ZPUB and YPUB keys, but they were very specific and kinda muddied up the water for how to describe outputs.

For XPUBs, the stuff in the [...] is key origin metadata.

Running Bitcoin

Some quick tips for running a bitcoin node.

networks

Signet is similar in nature to testnet, but more reliable and centrally controlled. The bitcoin daemon process can be pointed at a signet. signet=1 by itself will point at the “default” signet. Custom signets need a signetchallenge and some server in addnode

signet=1
[signet]
signetchallenge=something
addnode=address

bitcoin.conf

You can then check it’s all hooked up with something like bitcoin-cli -signet getblockcount.

remote connection

If connecting to a bitcoind instance from a different computer, local configuration can be placed in ~/.bitcoin/bitcoin.conf. I think at a minimum, a rpcconnect, rpcuser, and rpcpassword are required.

watch-only wallet

I have only created watch-only wallets with the new descriptors syntax. There are older encodings, like ZPUB and YPUB, but I find them to be a confusing blend of private material generation stuff (HD wallets) and the type of outputs to scan for with those keys. So just sticking to the new things which have a nice dividing line.

Even further, I only tried this for P2TR outputs, so a descriptor that looks like tr(<XPUB>/<INDEX>).

bitcoin-cli’s createwallet command has a bunch of flags, but I think only the disable_private_keys one needs to be flipped to make it watch-only.

bitcoin-cli -named createwallet wallet_name=watch disable_private_keys=true load_on_startup=true

Create a watch-only wallet called “watch”. The -named flag makes it a little easier to parse the args.

Then you can load up a descriptor into the wallet with importdescriptors. Might have to use the getdescriptorinfo first to get the checksum on the descriptor.

bitcoin-cli -rpcwallet=watch importdescriptors '[{ "desc": "tr(...)", "timestamp":1707933600 }, { "desc": "tr(...)", "timestamp":1707933600, "internal": true }]'  

Importing a descriptor and a related change descriptor.

I went on a bit of journey with the hardened syntax being with h or ', and it looks like it causes others pain as well. Going to just stick the h in the future. Not sure still why getdescriptorinfo appears to return different checksums…