// Fix the money, fix the world
Contents
Reactor online. Sensors online. Weapons online. All systems nominal.
Consensus
The value proposition of bitcoin is derived entirely from its leader-less consensus, known as Nakamoto Consensus.
The word consensus brings to mind a few things. Well…if I am being honest, these days I just think of bitcoin. But I used to think of other things! Like a jury where members are tasked with finding a verdict. Or computers in distributed systems which rely on quorums to make decisions. Juries and quorums are wildly different scenarios, but they have a commonality, trust is placed in relatively few members to pragmatically create consensus. Juries are small subsets of the population and quorums are small subsets of computers in a system. Generally, the time it takes to find consensus doesn’t scale well with the number of participants, so we trade trust for practicality.
This trust-for-consensus trade works for a lot of scenarios in life. But trust is power. And sadly, humans tend to abuse the shit out of power. So back to bitcoin’s value proposition: consensus, but without leaders. No leaders means no trust to abuse.
Sounds good, but what exactly is bitcoin’s Nakamoto Consensus creating consensus on? Just a set of virtual coins, some 1’s and 0’s. A currency of sorts.
scarcity
Setting aside exactly how it does it, bitcoin generates consensus over (just under) 21 million virtual coins. More specifically, consensus on who owns these coins. The interesting thing here though is that this is a form of scarcity, but informational scarcity. That 21 million-ish number is locked forever. If someone came along and tried to use some new coins they created, everyone else would know they are worthless.
Currencies always rely on scarcity in order to be valuable. People value gold because it is hard to find and dig out of the Earth. There is a low chance that someone can double the amount of gold in the world in a single day and devalue everyone’s holdings. So the physical scarcity gives people confidence to trust that gold will hold its value and protect their wealth.
Now whether you think bitcoin’s leaderless consensus maintained scarcity has any value…well, how much trust has been placed in the Federal Reserve to pragmatically manage the fiat dollar? The rabbit hole…
In any case, consensus on scarcity is achieved through a protocol backed by game theory and applied cryptography. Users place their trust in this protocol.
proof of work
Bitcoin outputs an ever-growing list of transactions involving virtual coins. Consensus covering this list, and only this list, is created because users trust the protocol instead of middlemen. The protocol is able to pull this off by leveraging some applied cryptography.
Let’s build this magic from the ground up.
A characteristic of cryptographic hash functions is leveraged over and over in Bitcoin: hard to reverse, but easy to verify. Compared to the simple algebra functions you are comfortable with, this is a very weird property. For example, let’s take the super simple function 2 + x = 5
. If someone came up and said “Hey, x
is 3.”, you could easily verify that by plugging it in 2 + 3 = 5
. But with some simple maths, you could have just as easily reversed the function yourself and calculated x = 5 - 2 = 3
. Using really mind-bending maths, a hash function keeps the easy to verify part, but makes the reverse part impossibly hard. It is what we call a one-way function. A real world metaphor might be cooking pancakes. It’s easy for someone to try a bite of pancakes and verify “Yea, these are pancakes”, it is harder for them to guess the exact ingredients used to cook those pancakes. And to make this metaphor more like cryptographic reality, we need to imagine a world where pancakes can have infinite ingredients, not the common handful.
Leveraging this hard to reverse, but easy to verify property is how the protocol balances user incentives to generate consensus. The incentives influence each other, so we’ll just start with how a transaction is added to the list and go from there. But as the system comes together, you’ll see how the mathematical knot, one-way functions push and pull users to a single spot. Consensus.
New Bitcoin transactions are added one group, known as a block, at a time to the list of all transactions ever, which is known as the blockchain. Which transactions go into the next block is determined by a Bitcoin user. But which user? Wouldn’t this user have a ton of power since they could dictate what transactions are “official” or who is allowed to send transactions? The user does in fact have this power, but to counter this, the protocol only allows for short-term, per-block, leaders. So why would a user even want this responsibility? Well it ain’t altruistic, there is a financial incentive. The leader collects all the per-transaction fees in the block that they add to the blockchain.
So now the tricky part. How does the protocol determine a leader? And why would other users trust this leader?
Blessing a leader is not a lottery, users must contribute work in order to have the chance to be the leader. The more work contributed, the higher the chance (linear) to be the leader. What is this work? The leader needs to do the hard part of the hard to reverse, but easy to verify hash function property. They have to guess an ingredient in the pancakes. But where the set of pancake ingredients is extremely limited, the “ingredient” set for a bitcoin block is extremely massive. Think atoms-in-the-universe scale. Users aren’t sitting at their computers manually entering in random bytes, that would take forever. Turns out, computers are good at this guessing game, so a user delegates to their computer to do it a few million times a second for them. Computers don’t run on hopes and dreams though, these calculations require electricity. A real world limited resource which costs something.
About every ten minutes (this is fairly predictable due to an algorithm in the protocol, but we won’t get into that here) a user (well, their computer) guesses the right input and gets to add their new block. They are the leader for that block. All other users are able to do the second half of hard to reverse, but easy to verify and easily verify that work must have been done to create the block (…proof of work).
There is little incentive to broadcast bogus blocks since they can easily be detected and discarded. Proof of work allows bitcoin to be censorship resistant since there are no long term leaders with power to abuse. If Mallory really hates Bob and is thinking “If I ever become the leader, I am not going to include Bob’s transactions in the block, I hate that guy.”, Bob can just take his transactions to any other user attempting to be the leader (called a miner in bitcoin lingo). Even if Mallory becomes the leader for a block, that doesn’t influence any of the following blocks, she will have to do the work all over again and continue to be the leader in order to censor Bob.
There is a high risk to ignore valid blocks. We have to sink into the blockchain structure to see why the risk. The blockchain is the list of all previous blocks and the blocks are chained together with the same hard to reverse, but easy to verify hash function, making it easy to verify the whole blockchain, as in, every transaction ever. If a block is modified somewhere in that chain, all blocks after it would then be invalid. So if a user verifies the blockchain, they know nothing has been changed or tampered with all the way to the beginning of bitcoin time. This layers proof-of-works on top of each other, making it harder and harder to modify a block the deeper it gets in the chain.
Let’s say Mallory paid Bob for some pancakes and the transaction is now three blocks deep in the blockchain. As stated earlier, Mallory hates Bob, so she decides to try and change the transaction sending the funds back to herself. In order to do this, she needs to change the transaction in the block and re-calculate the block’s proof-of-work. But changing the transaction changes all the newer blocks as well, so she actually has to calculate proof-of-works for the next three blocks! Plus, the rest of bitcoin users are actively adding new valid blocks to blockchain, so Mallory has to do all of this faster than everyone else combined. Turns out this costs way more than the pancakes and there is almost no incentive to even try.
So what differentiates the bitcoin blockchain from another one which follows the same rules? The one with the most work poured into it (not necessarily the longest) is the blockchain (assuming the blocks are valid according to the network…some blocksize-wars history their I am side stepping for now). Using a blockchain with less work would just be risky for any user (“Hey man, I totally have 1,000 BTC, it’s just over here on this other blockchain which only three people use…”).
So to summarize, there is incentive to participate in mining and a low barrier of entry. There is no incentive to produce bogus blocks and high risks to ignore valid blocks…hey, distributed consensus! And kinda beautifully, the protocol can be boiled down to energy in => consensus out
.
51% attack
A 51% attack is probably the most well known theoretical attack on bitcoin. Luckily, (as in that Satoshi person was pretty smart) the work in Proof of Work also helps protect against this kind of attack.
A 51% attack is when someone controls at least 51% of the work being poured into bitcoin. At this point, chances are they will guess the next block and thus be the short term leader. And as long as they hold at least 51% of the work, chances are that will happen again and again. With this power they could:
- Control what transactions get into the blockchain.
- Mine a “shadow” blockchain and perform a double-spend by swapping it with the public blockchain. It should be noted that this would be a public action though, so the rest of the network would be aware it’s happening.
This would suck, but it’s not actually that powerful. If an attacker could use the 51% to re-write transactions deep in the blockchain, somehow without it being obvious to everyone else, then it would be powerful. But that isn’t possible due to the proof-of-work chain. And as of 2021, the cost to control 51% of the bitcoin hashing power for just an hour is in the billions of USD. The cost-to-benefit of this attack hasn’t made sense since the early days of bitcoin (like 2012-ish).
If there ever is a successful 51% attack, it probably means the end of bitcoin, so maybe an attacker who wants to destroy bitcoin will try with this goal in mind. But if an attacker has enough power to even consider this attack, which today is limited pretty much to a handful of states, there are probably cheaper ways to try and destroy bitcoin (but good luck).
eclipse attack
An eclipse attack is kind of a more focused sybil attack. In a sybil attack, a bad actor floods a distributed network with nodes that they control in order to make it look like the network has strong consensus on something. In reality, it’s just one person’s opinion made to look like a lot of different people. An eclipse attack is when the target of the attack is just one person, not the whole network. An attacker singles out someone’s node and floods it with a lot of nodes all controlled by the attacker. In the context of bitcoin, an attacker might perform an eclipse attack on someone in order to try and trick them into thinking a different blockchain is the real one.
Proof-of-work again helps defend against this type of attack, and the general sybil attacks, because it doesn’t matter how many nodes an attack peers to their target node, the target only needs one other node to relay the real blockchain and they can easily verify that it is it. For an attacker to really gain anything, they will still have to produce valid blocks (work) so the attack is expensive and at the same time so cheap to beat. It’s probably not worth it.
blocksize
There is an “artificial” blocksize which limits the amount of transaction per second on the bitcoin blockchain. Miner’s are generally incentivized to choose transactions which pay higher fees. The scarce resource in this market is bytes on the blockchain.
This leads to a few questions.
- Is there a “real” limit to the blocksize?
- Why have the artificial limit?
- Is this related to consensus?
The blocksize debate back in 2015 is bitcoin’s most famous holy war (so far!). As with all holy wars, there was a lot of noise coming from parties with different interests. I believe the noise was generated by parties which valued bitcoin succeeding quickly over bitcoin succeeding at all.
Bitcoin’s only value proposition is its consensus without leaders. If that is degraded, you might as well use a simple, centralized database at that point. So any changes to the bitcoin protocol should examine how they affect the value of the consensus. The blocksize limit determines what quality of computer is necessary to run a node: lower limit means lower quality. The lower the limit, the more distributed the network, the higher the value of the consensus. I don’t know what the “real” limit is, but increasing the limit even a little bit starts to eat away at bitcoin’s only value proposition. I don’t think it is worth it. Instead, second layer applications (e.g. Lightning Network) are responsible for increasing transactions-per-second for different scenarios and developing more value on top of the layer one consensus.
energy
The energy use of bitcoin causes a lot of fear. Bitcoin is essentially energy in => consensus out
and the left side of that equation is objectively measurable, and the right side is completely subjective. This is also true for any household appliance, but bitcoin’s energy use is a few orders of magnitude higher than a single appliance.
The fact that bitcoin requires energy isn’t a fault, it’s a feature. bitcoin’s value is its consensus. If hypothetically, it required some super rare resource to mine bitcoin, that resource is most likely controlled by just a few players. Once there are only a few users to be leaders, bitcoin is less distributed, less robust, lowering the value of the consensus. Mining bitcoin only requires electricity and an internet connection. These low barriers to entry keep the consensus as distributed as possible, raising its value.
Another hypothetical, what if the resource required to mine bitcoin was virtual? What if the resource was bitcoin itself? This specific case goes by a whole new name: proof of stake. It is a popular idea because the consensus mechanism equation could become a lot less energy in => consensus out
. But, like a lot of the consensus design trade-offs, I think the equation actually becomes a lot less energy in => a lot less valuable consensus out
. This is difficult to analyze, but I believe the crux of the issue is that proof-of-stake adds a ton of complexity by introducing a weird incentive loop. Where bitcoin’s proof-of-work is anchored in the physical world (electricity), proof-of-stake is anchored in the virtual world. And that virtual world is the same world it is tasked with securing.
Digging into that a bit, it is easy to model proof-of-work miner incentives. Contribute energy, get bitcoin. If you stop contributing, you stop getting bitcoin. That’s it. A proof-of-stake miner “stakes”, or ties up bitcoin, so it can’t be used for anything else. They now earn more bitcoin. This is where things start to get weird. The miners can now just passively make more bitcoin. Do the rich get richer? Do the rich have the power to tweak the rules so they get even richer? The incentives are getting hard to follow.
I believe in the long run, proof-of-stake has strong centralization incentives which leads to weak, value-less consensus.
Transactions
Bitcoin is transactions and the blockchain orders them.
structure
A bitcoin transaction is made up of a set of inputs and a set outputs. Each input maps to a previous transaction’s output. Transaction outputs which have not been spent, as in have not been mapped to an input of another transaction, are called “Unspent Transaction Outputs” or UTXO
’s.
{
"version": 1,
"locktime": 0,
"vin": [
{
"txid": "7957a35fe64f80d234d76d83a2a8f1a0d8149a41d81de548f0a65a8a999f6f18",
"vout": 0,
"scriptSig" : "3045022100884d142d86652a3f47ba4746ec719bbfbd040a570b1deccbb6498c75c4ae24cb02204b9f039ff08df09cbe9f6addac960298cad530a863ea8f53982c09db8f6e3813[ALL] 0484ecc0d46f1918b30928fa0e4ed99f16a0fb4fde0735e7ade8416ab9fe423cc5412336376789d172787ec3457eee41c04f4938de5cc17b4a10fa336a8d752adf",
"sequence": 4294967295
}
],
"vout": [
{
"value": 0.01500000,
"scriptPubKey": "OP_DUP OP_HASH160 ab68025513c3dbd2f7b92a94e0581f5d50f654e7 OP_EQUALVERIFY OP_CHECKSIG"
},
{
"value": 0.08450000,
"scriptPubKey": "OP_DUP OP_HASH160 7f9b1a7fb68d60c536c2fd8aeaa53a8f3cc025a8 OP_EQUALVERIFY OP_CHECKSIG",
}
]
}
A transaction with one input and two outputs.
A UTXO contains two parts:
- An amount of bitcoin.
- A cryptographic puzzle,
scriptPubKey
, which must be solved in order to spend the bitcoin.
The scriptPubKey
name made sense historically, but now it would probably be better called “locking script” or maybe “witness script” (but more on that later). A fun fact about a bitcoin output, it must be spent in its entirety. This usually leads to a “change” output in a transaction which sends extra bitcoin back to the sender.
Each input contains four parts:
- A transaction ID, referencing the transaction that contains the UTXO being spent.
- An output index (vout), identifying which UTXO from that transaction is referenced.
- A
scriptSig
, the script which satisfies the puzzle placed on the UTXO. - A sequence number, a weird re-purposed field now used to enforce locktime and replace-by-fee.
The scriptSig
is another legacy name, today it’s more like the “unlocking script” or “witness”.
For a transaction to be valid:
- All inputs must map to outputs which haven’t already been spent.
- All inputs must unlock outputs.
- The sum of the inputs must be larger or equal to the sum of the new outputs. Any difference is an implicit transaction fee that is used to pay to get on the blockchain.
- The
locktime
must be passed (be it block time or height).
script
The scriptPubKey
and scriptSig
combine to form a little program to unlock a UTXO. The program is written in a programming language designed just for bitcoin called Script
. Script
is intentionally very constrained since these programs need to be verified all the time by all sorts of computers. It is not turing-complete, so no loops or complex control flow, keeping program complexity very predictable (relatively).
To run a transaction script, the scriptSig
+ scriptPubKey
are concatenated in that order (kinda feels backwards, but makes sense given how the program is executed). The program is executed from left => right. All script programs only contain two types of tokens: opcodes and data. Data is simply loaded onto the stack. Opcodes can pull data off the top of stack, do some calculations, and load more data on the stack. A program unlocks a UTXO if it can (1) run to completion and (2) the stack has a single non zero value (0
or an empty array) at the end (this used to be a little more open, but made simpler with the modern CLEANSTACK
rule).
An extremely simple (and insecure) program example is a scriptPubKey
of 3 OP_ADD 5 OP_EQUAL
. The UTXO in this case is locked by a script which requires a scriptSig
of 2
to unlock. The resulting transaction script would be 2 3 OP_ADD 5 OP_EQUAL
.
The program execution:
2
is loaded on the stack.3
is loaded on top of2
on the stack.OP_ADD
pops the two data values off, adds them together, and puts5
on the stack.5
is loaded on top of the5
on the stack.OP_EQUAL
pops the two data values off, compares them, and puts aTRUE
on the stack.
Since the program ran to the end and has a TRUE
on the stack, the output is “unlocked”. Pretty simple! Only issue with this example is that anyone could unlock the UTXO as long as they understood simple addition. Luckily, bitcoin supports a few more opcodes that make it useful.
Fun fact, a lot of bitcoin documentation is not explicit about “data push” operators which are necessary to push any data on the stack. Maybe docs kinda gloss over it since there are quite a few ways to go about it and is a bit confusing.
data pushes
Op codes 0-75 are used to push data on the stack. OP_1
pushes the next byte on the stack, OP_2
pushes the next two bytes, and so on. This obviously tops out at 75, so opcodes 76-78 are special. 76 means the next byte contains the number of bytes to push. 77 is the next two bytes, and 78 is the next four.
In bitcoin docs, you often see data pushes in script simplified to just carrot brackets, <data>
, instead of explicitly stating the necessary data push opcode.
OP_0
, or OP_FALSE
, is not a no-op, it pushes an empty array to the stack. You see this used to cancel out the multisig bug requiring some data to be popped off the stack. OP_1
, a.k.a. OP_TRUE
is defined as opcode 81 (!) since 1 is already taken for the data push. There is a specific noop operator though, 97 is OP_NOP
.
timelocks
A timelock is a restriction on when a transaction, or part of a transaction, is considered valid. There are a few different ways to do this with bitcoin transactions. But first, why would someone want to do this? It might be a little counterintuitive, if you are sending a transaction to pay for something, why would you want to restrict when the transaction is valid? Don’t you need it to be valid immediately? For a payment that is probably true, but more complicated contracts leverage restrictions like timelocks. For example, Alice could sign a transaction and give it to Bob as collateral “Hey Bob, here is a transaction paying you, but it isn’t valid for 30 days. I might pay you sooner which would void this transaction (can’t double spend and output).” The locktime can be set in block height or a unix timestamp, in either case though it is an absolute time.
Transactions have the locktime
field to control when a whole transaction is valid. This is enforced by consensus rules, so miners won’t attempt to include them in a block beforehand. I see “timelock” vs. “locktime” terminology mixed and matched, but I am using “timelock” as a more general term for all methods to restrict a transaction based on time. A shortcoming of the locktime timelock is that there is nothing stopping users from spending a UTXO before the locktime. In our Alice and Bob example, Bob has a transaction he can broadcast in 30 days and collect the funds, but Alice could broadcast a different transaction beforehand which doesn’t have a timelock and spends the same outputs, leaving Bob with a worthless transaction. Some script opcodes were added to address this. BIP65 added OP_CHECKLOCKTIMEVERIFY
which inspects the spending transaction and enforces it has a locktime set after the specified time.
<Bob's pubkey> OP_CHECKSIGVERIFY <now + 30 days> OP_CHECKLOCKTIMEVERIFY
Standard pubkey check extended with an absolute timelock.
With CLTV, Alice can now broadcast a transaction which contains an output that requires Bob signature, but also a timelock of 30 days. Now Bob knows the funds are his for sure in 30 days. It is another set of tradeoffs, but a useful primitive to build with.
Absolute timelocks are useful for restricting exactly when a transaction or output is valid. But sometimes you need something a little more relative. Relative timelocks were added with BIP68. Instead of being an absolute time, these timelocks depend on the confirmation time of the transaction being spent. “The transaction must be buried by 10 blocks before the UTXO is spendable”. So instead of having to guess the absolute time when creating the transaction, the clock starts ticking when the transaction is confirmed.
The interface of relative timelocks mirrors the absolute locktime ones, with a transaction field and an opcode. But there is a bit more complexity since they were added in backwards compatible fashion and kinda shoe-horned in. Relative timelocks use the sequence field on an input. So in bitcoin lingo, locktime == absolute and sequence == relative. The relative opcode mirrors the name: OP_CHECKSEQUENCEVERIFY
. The opcode works the same basic way where it enforces that an input has a sequence which forces the UTXO to be confirmed with a certain amount of blocks or time.
Both timelock opcodes, OP_CHECKLOCKTIMEVERIFY
and OP_CHECKSEQUENCEVERIFY
, force a transaction to have and “at least” timelock on it. Kinda interesting that it is explicit on both sides, I am not sure the benefit of that yet other than making sure both parties are on the same page, I’ll think through it. I also initially wondered why the absolute timelock is at the top transaction level, whereas the relative timelock is per input. But makes sense if you walk through it: the relative timelocks are relative to the UTXOs they are spending, and those could be different per each input. The absolute timelocks are absolute so if any of them have not “passed” yet the transaction is not valid. So might as well just put the one furthest out in the future at the transaction level. Maybe there would be a bit more flexibility for transactions which are built by multiple parties if it was also on the input level, but apparently there hasn’t been much demand for that.
Timelocks are heavily used in the “layer 2” Lightning Network.
OP_RETURN
OP_RETURN
is a weird op code which fails the Script program immediately with a message. It’s a way to embed some data into the blockchain, but it doesn’t burden the UTXO memory pool set since nodes can recognize that any UTXO script with OP_RETURN
is un-spendable and not worth keeping in memory.
P2PKH
A bitcoin script can define many different and clever ways to lock an output. But some patterns are so useful, a lot of the bitcoin ecosystem is tailored around them. One of the popular OG (but not the oldest) patterns is Pay to Public Key Hash.
Pay to Public Key Hash, P2PKH
, is a pattern to send bitcoin to a single person. In this case, a person holding the private key of a public private key pair. If the user wants to spend the bitcoin in this UTXO (for example, send it to another public key hash), they need to use their associated private key to provide a signature to solve this scriptPubKey
. The check signature operator, OP_CHECKSIG
, is essential for this script.
OP_DUP OP_HASH160 <PubkeyHash> OP_EQUALVERIFY OP_CHECKSIG
Public key hash unlock script.
The OP_DUP OP_HASH160 <PubkeyHash> OP_EQUALVERIFY
makes sure that the OP_CHECKSIG
can only be checked with the intended user’s public key. Without this, a user could provide any public key and an associated signature.
OP_CHECKSIG
which returns true if a signature signs the correct parts of a transaction and matches a provided public key. The real interesting part there is what part of the transaction is being hash’d for this signature? Transactions signal which parts of the transaction a signature covers through the SIGHASH
flag. It is safe to assume that the owner of the public key is cool with whatever parts of the transaction are covered by the signature and that they haven’t been tampered with by a bad party.
So the P2PKH
script ensures that only the owner of the public key can use the bitcoin.
A public key is a form of identification and is heavily used in the bitcoin ecosystem, so what’s a “public key hash” and why is it better than just using a plain old pubic key?
Blockspace is limited and costs money so anyway to save a few bytes is useful. Hashing the public key is a good security-to-performance trade-off since it really doesn’t hurt the security aspect at all, but shaves off quite a few bytes. bitcoin is a bit quirky and takes the pubkey, sends it through SHA256
and then through RIPEMD160
to create the hash. We would have to ask Satoshi why both, but we end up with a value that takes up less space.
addresses
Bitcoin addresses are part of the tooling built on top of popular script patterns to make them easier to use. Bitcoin address are short strings which encode virtual spots on the blockchain to send bitcoin. Bitcoin wallet software knows how to decode these addresses and construct the script the address describes.
For P2PKH
, the address always starts with a 1
(different numbers for different patters) so it’s easy for a human to quickly know what they are dealing with. These addresses also encode information so that computers can verify we humans haven’t fat-finger’d a number and just sent some bitcoin to /dev/null
.
Base58Check was the original bitcoin address encoding scheme, but since the SegWit softfork, bech32
is the new hotness.
The term “address” may not be the best from a user use-ability perspective. It gives the impression that an address should be used consistently, like a bank account number. But addresses should be thought of more like invoices. Address re-use degrades privacy and security. Privacy is degraded because the blockchain is public info, users can analyze it and start to triangulate an identity. Address re-use makes their job way easier to not only identity the re-user, but also those who send to the re-user. Security is degraded with address re-use because it opens the user up to niche attacks where a private key can be stolen. These attacks are pretty difficult to pull off in reality, an attacker would need to derive the private key from the multiple public signatures, but there is no reason to risk it.
fees
The blockchain blocksize limits the number of transactions that can fit in a block. How many transactions? That depends on the size (bytes) it takes to describe a transaction. One can imagine a transaction which uses a bunch of UTXOs to pay a big sum to someone. This transaction requires a lot of unlocking scripts, one for each UTXO. This is going to take more bytes on the blockchain than a transaction which only uses a single, simple UTXO. If a miner has to choose between one big transaction or a few small ones, all other things equal, well then it will grab the small ones and make more on fees. So the big one needs to put up a larger fee. The sats/bytes ratio is used to see what it would take to get a transaction into the blockchain given the current market.
What if you create a transaction and the fee is too small? Wait it out? That is one option, but two tools are available to bump the fee of a transaction.
The first is called “replace by fee” (RBF
). This is a policy where node operators will replace an existing transaction in the mempool with a new one which spends at least one of the same UTXO, but pays a higher fee. An important point here is that this is just a policy, not part of the bitcoin protocol. Miners are free to include any transaction in a block that they want, including an old one that a user attempted to bump a fee on (although miners don’t have much incentive to grab a tx with less fees). But because this is just a node policy, there can be many wrinkles to excatly how a node and its operator support RBF. BIP125 introduced an RBF policy (nowadays called “opt-in” RBF) which leverages the weird sequence number
field on a transaction. If a transaction uses a sequence number less than ffffffff
than it is signaling that it can be replaced (opting in) by a transaction with a higher sequence number. That new transaction has to have a higher fee than the original (if following this policy). An alternative policy gaining transaction these days is called “full RBF” which allows any transaction to be replaced, no opt-in necessary. Something to note about any RBF policy is that only the original transaction creator(s), with the secrets to unlock the UTXOs, can create a new transaction to bump the fee.
A second tool to bump a transaction fee is a pattern called “child pays for parent” (CPFP
) also known as “ancestor feerate mining”. This is where a second transaction is created which pays an output of the first transaction. A relatively large fee is placed on this second transaction in order to incentivize miners to include both the original stuck transaction and the new one in a block. One nice aspect about CPFP is that usually wither the sender or receiver can decide to create a CPFP transaction. The sender would attach it to the original transaction’s change output, which is to them, where the receiver would attach it to the UTXO headed to them. Technically any number of transactions could be chained together (I guess the upper limit being the number of transactions in a block? Gotta think on that.) and as long as the last one has a large enough fee a miner could choose to include them all in a block. However, most nodes have policy to only re-broadcast transactions with less than 25 parents.
Both fee bumping mechanisms are exposed to “pinning” attacks where someone could attach a new transaction to the original which makes it either prohibitively expensive to pay for all the fees or in some way breaks node policies so the new transactions won’t be re-broadcast’d.
dust
Dust is a UTXO which holds so little value, it is more expensive to pay for a transaction to get on the blockchain than they output’s value. This is a moving target given the market for blockchain space is always changing.
Even in the best case scenario for a transaction trying to spend “dust” UTXOs, where there are tons of cheap (size-wize) inputs and only one output who’s cost is amortized, there is a physical minimum size of a UTXO input unlock script. In this best case scenario, this input is for a taproot UTXO and would be around ~60 vbytes. The cost of this input alone is determined by the current blockspace fee, let’s say its 15 sats/vbytes. That means the UTXO needs to have a value greater than 60 * 15 = 900 sats
to be economically viable.
It is easy to see the incentive of doing some house cleaning when fees are low, combining dust UTXOs into one so that the produced UTXO is still economically viable when the mempool is full later on.
mempool
The mempool (Memory Pool) is the backlog of transactions waiting to be confirmed on the blockchain. It is often described as “The mempool”, but there isn’t actually a single mempool somewhere. Bitcoin consensus only covers the transactions in the blockchain, not those waiting to get in. Every bitcoin node, including miner nodes, has its own mempool. As nodes gossip new transactions that want to get onchain they add them to their mempool. So even though there isn’t “a” mempool, everyone’s pools are probably pretty similar at any given moment.
While the mempool isn’t covered by consensus, there are strong “policy” conventions established by bitcoin Core. These policies attempt to create the premier market to find the price of transactions getting into the blockchain with the goal to keep things as distributed as possible. Although it is always possible for users to go direct to miners with a transaction bypassing these policy rules. But users and miners both have high, aligned incentives for this distributed market. Miners want to be able to easily grab to most valuable transactions to put in a block, a.k.a. the ones that give them the most fees. Users want a very clear picture of the current fees in order to pay the lowest fees possible for their transactions.
grinding
The term grinding gets tossed around a bit in and funny enough it is kinda similar to mining, but maybe we just wanted a new name. When we hash something, by definition we have no idea where it is going to end up in the possible output set. But there are times we want that output to have certain characteristics, so we “grind” inputs until the output has them. When blocks are mined, they are looking for long prefixes of zeroes. And those are controllably very rare. But there are other scenarios where the characteristic is a 50/50 shot. An example is ECDSA R value size.
Witness Improvements
Bitcoin is a technical marvel and might just save us all, but it wasn’t born perfect. Bitcoin Improvement Proposals (BIPs) are technical proposals to improve the system. BIPs come in all shapes and sizes addressing any part of the bitcoin system. Many of the them have been merged into the protocol over the years.
Of all the BIPs merged so far, I don’t think it is too crazy of a take to say that some have had a disproportional impact on the protocol than others. A particular set of BIPs I am vaguely calling the “Witness Improvements” have probably had the largest impact. These BIPs were merged in three big steps: P2SH
, SegWit
, and Taproot
.
Each step, comprised of one or more BIPs, brought some new functionality to the protocol. But all the “Witness Improvements” share some common goals.
- Make bitcoin more efficient, enabling decentralization.
- Make bitcoin more private, making it more useful.
- Remain backwards compatible so it remains useful.
The technical aspect that ties these changes together is that they focus on the data which historically (spoiler!) lived in the scriptSig
part of the bitcoin transaction.
P2SH // BIP16
Pay to Script Hash, P2SH
, scripts were standardized in BIP16. They allow transactions to be sent to a script hash (address starting with 3). Why is this useful and such a game changer?
Script
is pretty powerful, but it puts the burden on the sender to come up with the appropriate locking script. For P2PKH
, this is really easy because there is a standard address to send to a person. But what if a user wants to create a shared UTXO where anyone of three people could spend it? They need to get this multi-sig script to the sender and hope they copy/paste it right. P2SH
allows the receiver to define the locking script and then just send the hash to the sender instead. This “receiver defines the invoice” pattern is closer to how transactions actually happen in real life, with the receiver usually defining an invoice which the payer agrees to pay.
Verifying a P2SH
transaction is a whole different beast. The P2SH
locking script pattern is recognized by node software as a “special” form, a template of sorts. So it gets executed slightly differently to “normal” legacy scripts. Two code paths is a heavy cost in complexity, but the benefits are too good here.
OP_HASH160 <20-byte-hash-of-redeem-script> OP_EQUAL
A P2SH output since it matches the specified template.
OP_0 <sig A> <sig B> <redeem-script>
Example input script showing data outside (or to the left of) the redeem script, and sidenote, the OP_O
is the multisig bug.
- The input and output scripts are concatenated like always and executed to verify that the given redeem script’s hash matches the one committed to in the UTXO. If it checksout, old interpreters which don’t know about P2SH will just see the
TRUE
on the stack and think all is good (CLEANSTACK
doesn’t apply here, so the rest of the data on the stack is ignored). - Now if the interpreter is P2SH aware, it recognizes the template of the UTXO and runs another validation step. The redeem script is decoded (a.k.a. “unpacked” into the running program) and ran on a new stack which doesn’t have that
TRUE
data on top.
I am not sure of use cases, but technically consensus allows non-data-pushes in a P2SH input script (outside of the redeem script). This is technically required for P2MS scripts like the one above to deal with the OP_0
bug. But standardness rules might not be as forgiving as consensus (scriptsig-not-pushonly).
Random thought, but what happens if a user coins locked up some coins with a script which by chance matches the P2SH template? They are not locking to a script, but just some random preimage. Do nodes still try to then decode the preimage as a script and fail, locking the coins forever? Sounds like the answer to this is simply “yes”, but there were not many scripts matching the template when P2SH was activated.
If we could go back and time and make one change to the original protocol, I think this is it. The invoice model is much easier for users to use and understand. We would also be able to drop tons of complexity in the protocol if the original flows don’t exist at all. But, live an learn.
SegWit // BIP141 + BIP143 + BIP144 + BIP145
The SegWit (“segregated witness”) upgrade was a large bundle of BIPs proposed back around 2015 and activated in 2017.
One goal of SegWit was to fix transaction malleability. transaction malleability allows transactions to be tweaked and leave them almost entirely the same, but with a new ID. An example tweak is to change the signature script with additional instructions that aggregate to nothing (OP_DUP OP_DROP
). The new transaction ID means any transactions based on the old ID are now invalid. This makes it really difficult to lock in digital contracts necessary for things like the Lightning Network.
Another big goal was to fix the quadratic sighash issue. In pre-SegWit bitcoin, the cost to create and verify a transaction signatures scales quadratic-ly (O(n^2)
). This isn’t great because it eats away at bitcoin’s leaderless consensus value prop by bogging down small nodes, pushing towards centralization. It wasn’t obvious to me why this was quadratic. But the crux is that we have to measure on the input size, the hash computation is dependent on the size (which makes sense, but for some reason I was just thinking constant time there). So pre-SegWit, n
inputs have to be signed n
times, n x n
. SegWit addressed this by coming up with a new signature scheme where the transaction parts are hash’d before being signed for each input. This way the input size is constant for real.
A third improvement was a change in how bitcoin addresses are encoded from Base58Check
to Bech32
. There are two big benefits to Bech32
. First, it uses a character set that doesn’t mix uppercase and lowercase letters, protecting human users from making mistakes. And second, the checksum validation is much safer, even allowing in some scenarios for mistakes to be corrected not just recognized.
There are other improvements bundled in the big SegWit change that I am skipping for now, but one last big requirement was to make all the changes backwards compatible (otherwise known as a “soft” fork). This just means that clients running old code wouldn’t break, but it doesn’t require that they get the new benefits. The fact that this major of a change went in as a softfork is kinda a minor miracle from an engineering perspective.
So how was this pulled off?
The big change was moving the witness
data (the input unlock scripts) into a separate data structure in a block. The transaction data that describes its effects (where bitcoin is coming from and going to) remains in the original location. Kinda clever, it sticks the witness data in a coinbase output which uses an OP_RETURN
code so that old software ignores it (backwards compatible), but new software knows where to look and verify the scripts. And very important, the witness data is still committed to by the block, it is included in the block header hash. Changing the witness data would require a new proof of work for the block.
Before segwit, it was pretty simple for miners to calculate what the most cost-effective transactions to put in a block. A block had (has? depends…) a max size 1MB. A miner just needs to maximize tx fees / tx bytes
. Small transactions (as in script size bytes, not value) with high fees are great! Large transactions with small fees suck! Easy. But how are SegWit transactions measure now that part of the transaction is stored somewhere else? There are a lot of things to weigh here.
One of which is that witness data pre and post SegWit is never in the UTXO set (a.k.a. the mempool). This means it puts less of a burden on the bitcoin system than the other parts of the transaction. Perhaps this data should get a relative “discount” to encourage more usage here? Another factor is that the old pre-SegWit nodes have a 1MB blocksize limit, to change this would be a hardfork (these nodes wouldn’t accept a 1MB+ block). While post-SegWit witness data is still stored in a block, it is stored in a new spot which is not included in the old 1MB calculation.
SegWit introduced a new calculation to figure out the “weight” of a transaction. These wight units (WU) are more abstract than the straight-forward bytes of old, but not too complex. The new policy for blocks is that they can only be 4M WUs, instead of 1M bytes (1MB). But how are the weight units calculated? 1 byte of base date is 4WU
while 1 byte of witness data is 1 WU
. Two really key points here. First, old pre-SegWit blocks are made up of 100% base data so the blocksize rule remains the same in their eyes (backwards compatible). Second, the witness data is getting a discount. If a transaction can be re-written to move more of its logic into the witness data, than it will be cheaper to get on the blockchain. This incentivizes putting much less burden on the UTXO set.
I am still looking for an ah-ha moment for why the witness data is given a 1:4 discount, versus something like 1:5 or 1:10. We could have rolled with no relation at all, but this would have added another dimension to the miner knapsack challenge for choosing transactions to put in a block. This needs to be avoided at pretty much all costs since it would lead to miner centralization as specialization occurs. But why 1:4? It appears that in practice, 1:4 is a good spot where inputs and output costs are generally equal, which is nice to keep incentives easy to reason about. I am not sure if this was totally on purpose or just a happy accident, but it appears to kinda fix an issue with legacy transactions where outputs were much cheaper to create than inputs. In other words, if outputs are much cheaper to create than inputs, it would push users to spend less UTXOs and create more small change UTXOs. This puts a burden on the UTXO set. On the other hand, we probably wouldn’t want to make inputs too cheap since it would start to eat away at privacy is everyone was consolidating. In any case, this is all still covered by consensus rules and is totally deterministic, so whether it is the perfect ratio or not this is the way.
If the signature data is stored somewhere else, what do SegWit inputs and outputs look like? A transaction paying to a SegWit address has an output which just loads data onto the script stack. The input’s scriptSig
(the old spot for the unlock script) is empty (the whole point of all this!). Checkout the backwards compatibility though, if an old client validates this transaction the script will end with data on the stack, the transaction is valid. To old nodes it looks like an “anyone can spend” transaction. Newer clients however recognize this SegWit data on the stack and perform further validations. Transaction malleability isn’t possible with SegWit transactions since the scriptSig
is now empty, instead moved to the new witness
field. And while the txid
doesn’t commit to the new field in order to avoid malleability, the new wtxid
does. These two IDs now do the job txid
used to have. The txid
continues to be used to chain transactions, but is safe to use with un-broadcasted transactions since no malleability concerns. And the wtxid
is used by miners to commit the whole transaction to a block, so the witness data is still “covered” and cannot be changed once confirmed.
SegWit introduced two new common transaction addresses to match the most used P2PKH
and P2SH
: P2WPKH
and P2WSH
. SegWit enabled wallets know how to decode these addresses and piece together the familiar script templates, but pull the data from the new spots. For interop with old clients, its even possible to embed a SegWit script into the old P2SH
.
SegWit outputs for P2WPKH
and P2WSH
have very specific templates. A version byte plus either a 20 bytes data push or a 32 byte data push respectively. For the initial SegWit softfork, the version byte is just 0
for v0. A SegWit enabled node recognizes these templates and knows to look in the segregated witness structure to validate inputs spending them.
witness stack
The witness data is no longer stored directly in scriptSig
field, but over in a witness stack. There is one stack per-input.
The witness validation execution is similar to P2SH
. The special validation is enabled by the output matching a template, but for segwit it is a version byte and then a 32-byte hash of the witness script (name for the redeem script in this context). The witness stack looks like a P2SH
input, with data elements on the left and a data push of the witness script at the end. The witness script is hashed and checked against the UTXO’s hash, but this is now implicit (vs. the explicit P2SH
output template). The script is then deserialized and ran against the data provided in the stack.
Taproot // BIP340 + BIP341 + BIP342
Taproot was the third big softfork bundle of improvements activated in late 2021. It extends upon SegWit which extends upon P2SH
.
A quick summary of these big leaps in the bitcoin protocol:
P2SH
moved the locking script from the output to the input.- Segwit moved the script out of the transaction.
- Taproot took advantage of both to ease script restrictions.
I think it’s easiest to start with what P2SH
addressed. Before P2SH
, payers were required to describe locking scripts in the transaction outputs. This makes sense from the payer-defines-the-transaction perspective, but generally, its the payee who knows how they want to lock up funds. Having the locking scripts in the outputs also means they end up in the expensive in-memory UTXO pool of every node. Pay To Script Hash flips this and puts the locking script in the input that unlocks the output. The output now just contains a hash of the script. For the output to be unlocked, a user must provide a script that hashes to the output has and the script must return true
.
SegWit improved the performance of the system by moving the unlock scripts out of the transactions, allowing them to be pruned in most parts of the system.
So P2SH
and SegWit were game-changers, but some weaknesses remained. First, the entire unlock script needs to be posted to the blockchain in order to unlock an output, even if only one of many sub-branches are relevant. Imagine a script with many if
statements and only one of them actually unlocks the output, but the rest are still sitting there on the blockchain. This has a negative performance and privacy impact. For performance, there are a lot of wasted bytes carrying around those extra unused logic paths. Nodes have to enforce some “max size” settings to protect against a large script bogging down the network, and this limits scripts even if the code path that ends up getting used is relatively small. For privacy, the extra paths disclose unnecessary information. For example, on a lightning channel close the transaction posted to the blockchain gives away enough information that it’s a safe bet both parties are operating lightning nodes.
Taproot fixes the remaining performance and privacy issues of P2SH
and sets the stage for future upgrades. The two keys to the upgrade were Schnorr Signatures (BIP340) and Merklized Alternative Script Trees (MAST) (BIP341) which were then codified in script with BIP342. And with these performance benefits in place, taproot scripts have more relaxed rules, but let’s dig into it.
Schnorr signatures are pretty much better in every way than bitcoin’s historically used ECDSA signatures. This makes sense because ECDSA was developed just to get around patent issues with Schnorr, so it’s essentially water’d down. The Schnorr patents have since expired so are now free to use! This opens the door for interesting signature use case which are now way more performant. The term “scriptless scripts” has popped up to describe these new flows since they generally take Script patterns and bake them into the signatures themselves.
First up is simple key aggregation (linearity). Public keys can easily be sum’d together to form a new public key which can be used in n-of-n multi-signatures scenarios, sometimes called scriptless multi-signatures. Historically, multi-signature scripts required signatures from all n
users. Now these would like simple one signature scripts. This helps both from a performance and privacy perspective since only one sig end up on-chain. There is coordination complexity though to create that one signature from multiple parties. Codifying a safe and performant way to do that dance is being done in the MuSig
work. Other scriptless script patterns include threshold signatures where k-of-n users can create a signature (checkout FROST
) and adaptive signatures where a secret can be baked into a signature (these bring PTLC
s to the Lightning Network).
The other big change in the softfork was Merklized Alternative Script Trees (MAST’s). These are new (for scripts) data structure that allows a script to only reveal the path which is used to unlock an output. A script which historically would have some if
statements can instead be modeled in a tree, with each leaf node being a possible unlock. So a classic lightning payment channel HTLC output which has a timelock clause to pull back funds on a failed transfer would only expose this clause if it has to be used. This makes it much less obvious that the transaction is a part of the lightning network. And even better, the unlock script only has to publish the “leaf” it’s using to unlock, freeing up all the wasted bytes which used to be used to describe the rest of the possible paths.
With all these performance improvements in place, some “standardness” policy rules were relaxed on the p2p network layer.
outputs, inputs, and script
So how does this look in practice. In the post-P2SH world, there have always been two common patterns: pay-to-public-key and pay-to-script. Taproot merges these back together since every output is now a tree instead. If we just look at the SegWit v0 (post-SegWit softfork) templates, P2WPKH
and P2WSH
, those outputs are always 22 bytes and 34 bytes respectively. It is obviously very easy to see these onchain. Taproot still supports these two uses cases, but every Taproot output is just 34 bytes (btw, the transaction size calculator is a nice tool to get an idea of things). It is no longer easy to see what an output is until it is spent, and even then, it might still be impossible to guess if it is a scriptless-signature.
Right off the bat I was wondering if an output always being 34 bytes, more than the 22 bytes of the super efficient P2WPKH
, would cause users to not migrate to Taproot. It turns out though that this is only about 1.5% more efficient across the whole transaction and only in the best case scenarios which isn’t often. So there is no reason to not always use P2TR
.
A P2TR
output is a SegWit v1 output. This is the first change to leverage the versioning system SegWit introduced. I have to imagine this made the change a lot easier to rollout since SegWit-aware nodes would see the 1
at the start of the output and know “Hm, it is a SegWit transaction, but I only understand v0, so just going to auto-approve”. Compare this to the big shifts which P2SH
and SegWit itself had to make for node validation logic. After this fork in the road though, all the supported validation can be different…and with taproot it is pretty different.
OP_PUSHNUM_1
OP_PUSHBYTES_32 9515a27af408da5b84ffbfdf376fe3ef459c5dea7d56e45f6054a6304ebb47b7
Example P2TR output.
The Taproot v1 output contains just a 32 byte Schnorr public key, the taproot output key. Anything with more bytes would just be considered a “I don’t know what this is, stamped”. I was curious why this was a full public key instead of a hash’d one, which previous output templates have made use of to save on blockchain space, and BIP341 mentions in point #2 of the rationale that the size trade-off wasn’t worth some of the collision implications. So all Taproot outputs look like “pay to pubkey”.
Since the Taproot scripts are using Schnorr, a single pubkey possibly covers a few common use cases, like pay to person and pay to multisig. What is neat is that even after these outputs are spent, there is still no indication onchain about which use case was used.
The Taproot script validation has a tad more going on though. The first big logical fork depends on how many items are in the witness stack. If there is just one, it is assumed that this is a signature for the output pubkey. This is called a keypath spend in Taproot lingo since no further script is involved. These will probably make up the bulk of P2TR
outputs since they cover the common use cases and are the cheapest onchain. But a more complex output requires a scriptpath spend.
Scriptpath spends are where the MAST structure is placed so only the script being spent is revealed onchain. The Taproot validation assumes it is a scriptpath spend if there is more than one item on the witness stack.
Even in the simplest keypath spends, it is recommended that wallets still tweak a key once so that there is no possible script path. The taptweak is the commitment hash used to tweak the initial internal key. Although there is nothing consensus-wise to enforce this since no one can see if you did or didn’t. It kinda simplifies things a bit though with what is being hashed and tweaked. A keypath taptweak is just the public key, where a scriptpath taptweak is the public key concat’d with the merkle tree root.
// Keypath key from a taptweak of just the internal key P.
Q = P + H(P)G
// Scriptpath key from a taptweak of the internal key P and the root of the merkle tree.
Q = P + H(P||Merkle_Root)G
Best practice to always tweak a key to remove undefined use cases.
The script tree is a merkle tree, and the leaves are the alternative scripts. This isn’t an abstract syntax tree where there are conditional “branches” (e.g. if this go here else go here). Instead, each leaf is a script “alternative”, independent, script. If there is just one leaf script than it is the root.
The witness of a scriptpath spend needs to contain the script itself and any data to satisfy the script, just like P2SH
outputs of old. But then it also needs a few more bits of data to confirm that the script was committed to in the merkle tree. The witness needs the initial internal public key (pre any tweaks) and the merkle path from the script’s leaf node to the root. The merkle path are the necessary branch hashes to hash with the script all the way up to the root.
Initially this sounds like a like of extra byte overhead for a simple P2SH
. And it might actually end up with something like an extra 32 bytes in the “single script” case. But since the overhead grows logarithmically, it only takes a few alternative scripts for this to pay off fast.
public keys and parity bits
Taproot switches to the Schnorr signature scheme, but it is using the same secp256k1
curve for public/private keys. But the public keys are encoded differently. Public keys are points on a curve (vs. private 256-bit/32-byte scalars) so they are made up of a 32-byte x and a 32-byte y value. A “compressed” encoding of a public key is just the x value and a single byte used to flag if the y value is even or odd, so 33 bytes total. In taproot, the public key is just 32 bytes and y is assumed to be even. This saves a byte, but introduces some extra overhead when validating tweaked keys on script spends. A parity bit needs to be passed along to flag odd keys.
merkle paths
Ok so you have a witness script you want to add to a UTXO, how exactly to you hash that up and embed it in the internal key with a tweak?
At the bottom of the MAST the leafs are hash’d with hash(leaf_version || compact_size(s) || s)
.
Wallets
A bitcoin wallet is a special type of client application. The name “wallet” is kinda misleading, since a new user might assume the applications is “holding” coins like a real-world wallet holds bills. But coins exist on the public blockchain. Wallet apps hold the secret materials necessary to move coins on the blockchain. This is commonly private keys used to sign transactions, but there could be other details needed for special scripts.
coins vs. accounts
The bitcoin blockchain only keeps track of how coins are locked up. When you hear terms for applications like “wallets” you might assume there is more of an account system under the hood. But that isn’t the case. For a bitcoin wallet app to show a user’s “account balance” in bitcoin, the wallet client software needs to know what outputs (usually derived from public keys) the user owns. It can then scan the blockchain and sum it up for them, but there is no simple “account” primitive on the chain.
Some other blockchains, like Ethereum, actually use an account system under the hood. Transactions are not “unlock, lock”, but instead are “spend from balance, add to balance”. It is a subtle difference, but has a butterfly effect on all the systems built on top. This results in all sorts of technical trade-offs, but the one I am most interested in relates to user privacy. The account system has technical incentives for users to re-use their account instead of creating a new one per-transaction. The coin output model has no such incentive. This incentive lowers the privacy of the account system since it makes it easier to connect users. These little incentives tend to snowball in the complex systems, which in this case, severely effects user privacy.
hierarchical and deterministic // BIP32 + BIP39
While there are infinite ways to lock bitcoin, most of the time it comes down to “the person with this secret private key can unlock these bitcoin”. And the private key is just some 256 bits. Humans are not very good at memorizing 256 0
’s and 1
’s though. And while you can use one private key your whole life, it is terrible for security and privacy. Ideally, you use a new private key per transaction. But wow, that is a whole ton of 0
’s and 1
’s! Get just one wrong you are out of luck. The good news though is some patterns have emerged to help us out.
So how should someone go about choosing their secret 256 bits? And once they have them, how do they make it easy to remember them? You could choose 256 0
’s in a row, that would be really easy to remember. But it would also be easy for someone else to guess. So we want the bits to be totally random, but memorable…kinda at odds with each other. This is where BIP39 comes in.
BIP39 defines a process where some number of random bits are instead represented by just a handful of words. All a user has to do then is remember the words, which humans are generally pretty good at. There a few knobs to turn in the algorithm, but the popular version represents 128 bits as 12 words. But how? The key is defining some mapping of 2048 words to 11 bits. 2^11 = 2048
so each permutation of 11 bits is assigned a word. If you choose 12 words at random from the list, you now have 11 x 12 = 132 bits of entropy (randomness). And don’t worry, the list of 2048 words is already defined in BIP39 so we ain’t changing that any time soon. Even cooler, the words were selected in a such a way where you know which word it is with just the first four letters.
BIP39 includes a more clever checksum approach to choosing words though. The computer generates 128 bits of entropy and then takes the SHA-256 hash of it. The first 4 bits of the hash are the checksum and that is concatenated with the original 128 bits. You now have 132 bits, exactly 12 words, but with the added benefit that the checksum is a quick way to check the other 128 bits. Maybe the user wrote down a wrong word or something.
But why 128 bits? Why not represent a full 256 bit private key? Or even more, say 512 bits? Well, nothing is really stopping you, but then you have to remember your special scheme. BIP39 was designed to compose with BIP32. We will get into the details of BIP32 later, but the high level goal is to take 128 bits and expand it to 512 bits. Off the cuff, this sounds a little insecure. We are mapping a 128 bit space (2^128 possibilities) to a 512 bit space (2^512 possibilities)? That means only, at best, only a tiny fraction of the 512 bit space can be generated! But the power of cryptography is here to save us.
Cryptographic hash functions are used to map the 128 bits to 512 bits in a process known as key expansion. By definition, the functions have uniform distribution (don’t skew to one end of the field), are unpredictable (no idea where an input lands in the field), and are one-way (if you start at a point in the field, no idea what input gets there). This means it is infeasible to identify which numbers is the 512 bit field actually map to the 128 bit field. If an attacker knows I used 128 bits to make my 512 bits, it gives them no extra information to find my 512 bits. They will have to just brute force the 128 bit space, which is still computationally infeasible.
So to answer the earlier question, 128 bits are used because it is still secure and easier to memorize 12 words instead of 48. It is a trade-off.
BIP39 adds one more layer of security which I think brings a lot of UX complexity. In the key expansion process, right before the 128 bits are sent through the hash function ringer to be expanded, a salt is added. The salt is the static string mnemonic
concatenated with user supplied passphrase, sometimes referred to as the 25th word (which is confusing cause it is totally different than the other words, not even required). Essentially just tossing in some randomness so the output lands in a completely different part of the field. This bring a few subtle security upgrades, at the cost of the user needing to remember another word. And while there is no specified maximum length, there are performance costs to making it really long. Not to mention harder to remember.
Salts do help protect against any pre-computation done by attackers (rainbow tables), but there are a few steps in key expansion which already protect against that including the large input space (2^128) and sheer number of calculations per round. The static salt, mnemonic
, obviously is known to everyone so doesn’t help for rainbow tables, but it does give a nice namespace reducing any chance of collisions.
Wallet apps have used the passphrase as a way to create 2 wallets for a user, one with the passphrase and one with the same words, but a blank passphrase. This opens the door to some plausible deniability for the user. But the different uses of the passphrase across wallet apps has lead to fragmentation and a confusing user experience. Hopefully that gets iron’d out someday.
So with all that said, what does BIP32 actually define? It brings the hierarchal determinism.
Let’s say a user wants to control their keys from multiple devices, and they are following best practices to create a new key per transaction. So they have a bunch of private keys lying around. One way to do it is copy/paste all the private keys and keep them in-sync between devices as they are generated. Feels just a tad error-prone though. BIP32 describes an algorithm wallet software can use to generate (or re-generate) keys from a single secret seed. So given just one secret, you have infinite keys (or at least as many as you need in a lifetime). BIP32 also defines a way to describe paths to keys from the seed, so something like “the third key down on the right”. These allow users to give some hierarchy, like a tree with leaves, to their keys if they so choose (e.g. these keys are for business X and these over here are for business Y). As one can imagine, it is very important for a user to keep the seed secret since it is the key to all the keys.
It is pretty easy to see how you can take BIP39 to define a secret seed, plug that into BIP32, and bam, you now have an easy to memorize secret which gives you infinite addresses.
The 512 bits gives BIP32 a little more wiggle room to start spitting out keys with some cool properties. A naive key generation algorithm is just taking the hash of a 256 bit private key to get a new private key. This can go on forever, but if just one is leaked, the following keys in the chain are all compromised. BIP32 uses its extra bits to put in some protection against this and more.
These are called extended keys which come in public xpub
(extended public key) and private xprv
(extended private key) form. These are just like normal public and private keys, but include extra information for generating keys in their tree. One cool feature of BIP32 is that it’s possible to generate just the public key side of trees using some elliptic curve maths. This can be useful to generate receive addresses or have a watch-only wallet, but these keys should still be protected for privacy.
A layer of security BIP32 brings is hardened keys which are kinda like a firebreak. Non-hardened keys can be compromised if an attacker gets the xpub
plus a child private key. This is a enough information to calculate the parent private key with some algebra. The attacker can then derive all child keys, essentially hacking a whole tree. If instead a hardened key is used, even if a child private key is compromised, the parent private key cannot be calculated. So the rest of the tree is safe.
So you could just flip a coin 256 times and make a single private key, but why miss out on all this fun?
output descriptors
Output descriptors are a standard way to describe outputs and either the public or private materials needed to create them. This was kinda tried before with the ZPUB
and YPUB
keys, but they were very specific and kinda muddied up the water for how to describe outputs.
For XPUBs, the stuff in the [...]
is key origin metadata.
filters // BIP37 + BIP157 + BIP158
If a user knows the address which they own the private keys for, they might not want to run a full bitcoin node just to checkin on a few spots in the blockchain. Instead, they would rather just ask an already running node which they trust to look up those spots for them. This common application is described as a “light client” since it offloads some of the heavy responsibilities.
This is fine if a user fully trusts a node, but ideally they could ask any node in the network for this information. The problem here is that it is a huge privacy leak. A node that is constantly receiving requests about a certain address from the same IP can start to connect the dots as to who owns the bitcoin. A client could try and hide their address by asking the node to send them a ton of other addresses which they don’t actually care about, but then the “light client” becomes a lot less light.
A series of BIPs have been proposed and implemented to try and address this privacy and performance concerns. The first of which was BIP37 which adds Bloom Filters to the peer-to-peer protocol.
A bloom filter is a probabilistic data structure which is useful for efficiently testing membership in a set. Probabilistic means that the bloom filter might not always return the right answer, however, it guarantees to only return false-positives not false-negatives. This means it might say a member is in a set when it actually isn’t, but will always say a member is in the set if true.
There is a lot more complexities to BIP37’s protocol, but one can see how the bloom filter is a nice fit for the use case. The light client can registers some sort of address filter with a node, and the node can check the filter and only return relevant info to the client. The client might receive a little extra data, but they can discard it. And the extra data helps hide their actually addresses. Bloom filters are configurable so they can be more or less accurate so a client can choose more privacy at the cost of more trash data.
BIP37 has some flaws though, including privacy leaks. There are quite a few ways an attacker can abuse the protocol to narrow in on what addresses a client is interested in, like probing or correlations over time. Having different filters per-connection reveals something about the client and puts an asymmetric burden on the servers, often opening them up to DOS attacks.
BIP157 and BIP158 were introduced to address these issues by moving the filtering to the client side. BIP157 defines the new protocol and BIP158 defines the new filter structure. The idea is to flip where filters are created and stored. Instead of clients sending them to the nodes, nodes create a compact filter (a Golomb-coded set…fancy new data structure) per block. This filter is block specific and can be sent to any client, so the asymmetric work issue of BIP37 is removed. When a client receives a filter for a block, it can check the filter really fast to see if it actually interested in any data in the block. The client doesn’t have to ask the same node for that block. It can now go to any node which will help maintain its address privacy.
Lightning
The Lightning Network is a system built on top of bitcoin. The bitcoin protocol establishes leaderless consensus, but one of the requirements for that is limited block space. This caps the number of transactions per seconds on the bitcoin ledger. Assuming blocks are always full due to usage, there emerges a market for getting a transaction on the blockchain. This prices out small transactions.
Lightning allows bitcoin to scale while not sacrificing any of the leaderless consensus. Small transactions are possible again…but how is this best-of-both-worlds achieved? Lightning is another protocol, again using game theory and applied cryptography, but layered on top of bitcoin transactions. The layer term is a bit loaded in bitcoin, but the most important property of the protocol is that at any point, either party has full control of their funds. No trust enters the picture.
The scope of the lightning protocol is limited to two parties instead of a global ledger like the bitcoin protocol. An abstract payment channel is established between two parties which is backed by a bitcoin transaction. Once the channel is established, the two parties can now use more of that game theory + applied cryptography magic to send near instant, cheap payments to each other while still remaining trustless. While all bitcoin transactions on the blockchain are public and completely unambiguous, payments in a payment channel are only known to the two parties. This is how the costs are kept down and the speed fast. If the two parties ever disagree on the state of the channel, then that bitcoin transaction between the two parties is used as a judge to remove any ambiguity by leveraging the bitcoin blockchain’s global consensus. This dispute resolution involves blockchain fees, so is generally avoided unless absolutely necessary.
So bitcoin transactions are cheap and fast again!…if between 2 parties. But what if these payment channels are linked together to form a network of sorts…a lightning network…
payment channels
Payment channels are digital contracts which leverage two concepts not seen in the simple bitcoin transactions.
- Creating valid bitcoin transactions, but not immediately broadcasting them to the bitcoin blockchain. These are often referred to as pre-signed transactions, but I find that a bit of a misnomer. Technically a valid transaction doesn’t require a signature (just usually), and what is special about these transactions is that they are not yet on the blockchain.
- Timelocks which push out when outputs can be spent.
in the beginning…bounded unidirectional
A unidirectional channel with a set end time (e.g. will close 3 days from now) is pretty easy to model with these ideas in mind.
A
writes a transaction with a 2-of-2 multi-signature output, but doesn’t broadcast it.A
is payingB
(unidirectional), soA
funds the whole transaction.- Before broadcasting this “funding” transaction another transaction is created based on the funding transaction output. This “refund” transaction is a special case “commitment” transaction which sends all the funds back to
A
with a transaction timelock of 3 days. B
signs this first commitment transaction so thatA
is willing to trust the payment channel.
[ A | 2 of 2 of A and B] -- [ SigA SigB | A's balance of 100% / A ]
| B's balance of 0% / B ]
The funding transaction completely funded by A
and the first commitment transaction with both signatures, and most importantly, a nlocktime
of 3 days which isn’t captured in the ascii art…my bad.
What are the incentives of the first commitment (refund) transaction?
B
doesn’t really care, they don’t have any bitcoin in the funding transaction locked up in the funding transaction, so even ifA
refunds everything from the funding transaction its like nothing happened.A
knows they can get all their money back even ifB
disappears, they just have to wait 3 days to broadcast the first commitment transaction in order to satisfy thenlocktime
requirement.
After A
broadcasts the funding transaction, A
and B
can trade within the payment channel by A
exchanging new commitment transactions. These transactions are not broadcast to the blockchain, but they are still valid transactions. Each commitment transaction is a possible closing transaction with one output going to A
and one to B
. And, very important, these transactions do not have timelocks so they can be broadcast whenever.
[ A | 2 of 2 of A and B] -- [ SigA <SigB> | A's balance of 99% / A ]
| B's balance of 1% / B ]
Updated commitment transaction which A
sends to B
, B
doesn’t need to sign it yet, this transaction does not have a timelock!
Since this is a unidirectional channel, the transactions are updated where A
pays B
a little bit more for each transaction. This is done by A
receiving less from the funding transaction output when the channel is closed. A
signs the transaction and gives it to B
as payment. B
does not have to sign or broadcast the transaction yet, but it is valuable to them since their output now has more bitcoin.
What are the incentives during the payment channels lifetime?
- All of the commit transactions are valid, but B has an incentive to sign and broadcast the one which gives them the most bitcoin. In this model, that is always the most recent one.
A
can’t steal by broadcasting any of the old commitment transactions which gives them more money becauseB
never signed them.- If
A
attempts to steal by broadcasting the original “refund” commitment transaction,B
has 3 days to sign and broadcast a commitment transaction and pull from the funding transaction UTXO first. B
must broadcast a commitment transaction before the 3 days are up to ensureA
can’t steal by broadcasting the original “refund” commitment transaction
The timelock’d output of the not yet broadcast refund commitment transaction allows both A
and B
to trust the digital contract of the payment channel. But ideally funds could flow in both directions and the contract could last longer than some arbitrary timelock…
indefinite bidirectional
Good news! It’s possible. But indefinite bidirectional channels require more unpublished transactions and timelock layering to make sure incentives stay aligned.
Bidirectional channels are funded with a transaction that has a 2-of-2 multisig output just like unidirectional. Commitment transactions still have an output for each balance in the payment channel. If we kept the same protocol from the simpler unidirectional model, B
could send a commitment transaction that pays B
less than the one before as a form of B
paying A
. But there is nothing stopping B
from broadcasting an old commitment transaction which A
has already signed giving B
more bitcoin. We can no longer depend on the simplicity of the single timelock’d refund commitment transaction.
[ A | 2 of 2 of A and B] -- [ SigA <SigB> | A's balance of 100% / (A && after 3 days) || (B && revocation secret) ]
| B's balance of 0% / B ]
A
’s refund commitment transaction.
[ A | 2 of 2 of A and B] -- [ SigA <SigB> | A's balance of 100% / A ]
| B's balance of 0% / (B && after 3 days) || (A && revocation secret) ]
B
’s refund commitment transaction.
What is all this??
In the unidirectional model, the timelock delay was used to give B
an opportunity to thwart A
from attempting to steal. In the bidirectional model, both parties can attempt to steal so both need a window to thwart the other. Instead of there being one refund commitment transaction, there are two, one for each party and the transactions are mirrors of each other (technical term: asymmetric). A
constructs a transaction which timelock’s A
’s funds and B
constructs one which timelocks B
’s funds. Each party signs the others transaction to “commit” to the channel, just like how B
signed the refund transaction in unidirectional-land. One important note here too is that in unidirectional-land, the timelock was on the transaction, but here we are using the more flexible timelock in the locking script.
And that “revocation secret” part? In the unidirectional model, B
could broadcast any commitment transaction to thwart A
from stealing since the commitment transaction can immediately spend the funding transactions UTXO. In this bidirectional model, what happens if A
broadcasts an old commitment transaction given them more than the current commitment transaction? B
has three days to thwart A
, but all they have is another transaction that spends from the same output (useless)…or do they? The revocation secret is a new part of the protocol which allows B
to punish A
(and vice versa) in the event that they publish an old commitment transaction. B
gets to take A
’s balance instantly while A
is timelock’d out.
The unidirectional model had a very straightforward handshake. A
asks B
for a funding transaction signature and then A
exchanges commitment transactions for whatever B
is offering. The incentives are very clear. How is this handshake accomplished in the bidirectional world with its multiple commitment transactions and revocation secrets?
I was surprised when I didn’t find a detailed description of the handshake in BOLT02 where this protocol is defined. I believe though that is because the order doesn’t really matter, because the dominant incentive is for the side gaining balance to check two boxes before considering the channel state updated.
- A signature from the other party for the new commitment transaction
- The revocation secret from the other party to invalidate the other party’s old commitment transactions (so no take backs)
So the state update protocol, from a high level.
- Both parties create their own new asymmetric commitment transactions.
- Parties trade signatures for the new commitment transactions and old revocation secrets.
A nice effect of always creating new commitment transactions for every state change is that the channel can be open indefinitely! Indefinte, bidirectional channels, very cool. But what if you want to pay someone you don’t have a channel with?
multihop
Not all nodes in the network are directly connected, but it is still possible to make payments from any node to another! These multi-hop transaction contracts are called Hash Time-Lock Contracts (HTLCs) and are how payments are routed across payment channels as of today (could change with new tech in the future).
(A) -> (B) -> (C)
A
is routing a payment to C
through B
.
Theoretically, A
could pay B
and B
could pay C
, but that sounds like a whole lotta trust. How do we make this trust-less? Incentives and timelocks baby.
A deep dive on HTLCs.
malleability
A little trip down memory lane, but the Lightning Network protocol’s implementation required a bitcoin soft-fork back in 2017 (technically “required” is a bit dramatic, but keepin it simple for now). Before the soft-fork, it was possible for an unpublish’d transaction’s ID to change. What so bad about that? Well, the first commitment transaction for a funding transaction needs to know the funding transaction ID in order to route the output back as a refund for the initiator. If that ID can change, that means it would be very risky for an initiator to open a channel. There is a chance the funds just all end up going to the other party.
The Segregated Witness feature introduced in the 2017 soft-fork made unpublish’d transaction IDs immutable. Before the fork, witness data (the signatures to unlock an output) where included in the transaction ID. Small modifications to the signatures would still unlock the output, but result in a different ID. “Segregating” the witness data meant puttin it over there (in a different data structure) so it was no longer included in the transaction ID hash. Now we can link unpublish’d transactions together which unlocked the door for lightning.
millisatoshi
The smallest denomination on the bitcoin blockchain is the satoshi
, which is 1 / 100,000,000
of a bitcoin. The Lightning Network protocol is backed by un-published (but publish-able) bitcoin transactions, which means that a satoshi is the smallest amount you can transact on the Lightning Network…or does it? What is the fee of a lightning payment of 1 satoshi? This shouldn’t be free since it requires a bit of resources, but is it possible to charge for it? The Lightning Network is almost always off-chain, so technically it doesn’t need to play by the rules as long as consensus can still be backed by the bitcoin transactions (and both parties buy into that).
To address this and future-proof the protocol (e.g. what happens if in the future 1 sat == $10 USD?) the lightning protocol introduced millisatoshi (1 / 1000
a satoshi), sometimes referred to as “msat”. This allows for fees to be collected on the millisatoshi level, sub-satoshi. The two parties on either side of a lightning channel keep track of things at this level as well, but floored to the nearest satoshi for the bitcoin transactions. Anything “extra” goes to the miner fee for the transaction.
Not sure I love it to be honest, feels kinda hacky and error-prone. But without it, there would definitely be a floor of minimum-viable-transaction-size which would be a bummer too.
sub-htlc payments
An HTLC is a way to escrow funds between a sender and receiver. At the technical level, an escrow’d HTLC is a UTXO on the payment channel commitment transactions. The sender is OK with risking these funds because they either get what they want in return (the preimage of the payment) or the UTXO can be swept up back to them after a certain amount of time. The receiver isn’t risking funds, they just have an incentive to get the preimage so they can collect the funds for them self.
Generally, HTLC UTXO’s are created, but never broadcast to the network. Instead they are “folded” back into the commitment transaction balances per-node. If a channel “fails” though (one of the nodes no longer trusts the other), a commitment transaction is broadcast. If an HTLC was in-flight the commitment transaction has the extra HTLC UTXO and it needs to be swept up by its rightful owner.
Because HTLC UTXO’s need to be broadcast-able, there are a class of lightning network payment types which are not escrow’d as UTXOs. Sounds sketch! In reality though, I think the risk is minimal.
The two categories of sub-htlc payments:
- Actual millisatoshi, a.k.a. sub-satoshi, payments. The blockchain’s lowest denomination is satoshi so an HTLC UTXO has to be at least one sat.
- HTLC UTXOs which would be considered dust UTXOs. This is a moving target based on the blockchain space market.
Both scenarios are payments of small amounts. But it feels bad to lose the HTLC protections. How are these small payments handled if they can’t be a UTXOs?
The small amounts are still escrow’d out of the sender’s balance, but instead of a UTXO, they are pointed at the miner fee of the commitment transaction. An HTLC UTXO goes to the receiver or the sender even if the channel fails, but a sub-HTLC payment will go to the miner if the channel fails. Do the incentives radically change? The sender is taking on more risk because if the receiver disappears they will not get the escrow’d funds back, they will go to the miner now. But the receiver doesn’t have a new incentive to disappear (unless they are a miner…). For small transactions, this seems like an OK tradeoff.
channel reserves
There are a lot of little complexities of payment channels which make it much harder than expected to calculate at any given point how much bitcoin can you send from a channel. One of those little complexities is channel reserves. These are described in bolt02. Here is the gist, Alice and Bob have a payment channel and over time all the liquidity ends up on Bob’s side. In other words, if the channel is closed all the funds would be sent to Bob. What is stopping Alice from broadcasting and old channel state where she get some of the funds instead? Usually, Bob would broadcast a penalty transaction if Alice tries this maneuver and get all the funds in the channel. But that is already the case! Alice has nothing to lose since the penalty transaction is the same as the current state, might as well try and steal some back.
Enter channel reserves. These are the minimum amounts of liquidity to be kept on both sides of a channel to ensure skin is in the game and penalty transactions hurt. Initially when a channel is opened, its possible for one side to not meet its reserve requirement, but it is built up over channel usage.
anchor outputs
Commitment transactions are usually signed by both parties a good while before they are broadcast (if necessary). It is possible that the block space market could dramatically change before a broadcast attempt and the transaction’s fee would no longer be high enough to make it in. The transaction needs a fee bump. Special UTXOs are added to commitment transactions as “anchors” for future CPFP transactions. One anchor per user so each has the ability to bump a fee. Ideally these outputs would have almost no value, just there in case necessary, but that doesn’t vibe with node policies to re-broadcast transactions since they would be dust. Special “carve out” policies have been implemented in bitcoin core just for these anchor outputs.
Anchor outputs allow for more flexibility when it comes to how a node operator allocates funds for worst case scenario force closes. Without anchor outputs, each channel has to set aside funds in miner fees to ensure the commitment transaction can make it in a block. But with anchors, these fees can be set much lower (I am wondering if they can be set to zero…) which allows the full capacity of the channel to be used for routing. However, the node operator will need some onchain funds set aside to use in case they need to create a CPFP
transaction to pull in a commitment transaction. At least now though they can decide how much risk they are willing to take on across their channels. Maybe they will hold enough in reserve to CPFP
all channels if necessary. Maybe just one, up to them.
timelocks
Bitcoin transactions can have absolute or relative timelocks. Timelocks are obviously a big part of the lightning protocol allowing parties can remain trustless. There are multiple parts of the protocol using timelocks, the commitment transaction balance outputs as well as the HTLC outputs. Interesting though that relative timelocks are used for the balances, while absolute timelocks are used for the HTLCs.
The commitment transaction balance outputs have a timelock to give a counter-party time to pull funds back if party tries to cheat by broadcasting an old commitment transaction. You wouldn’t want this to be an absolute timelock since the attack could just wait to broadcast to minimize the size of the window the counter-party has to act. This is a use case for relative timelocks so that there is always at least X blocks for the counter-party to act.
HTLC output are timelock’d so that a router knows their funds are safe if a commitment transaction goes to chain and the forward party disappears. If that was the only requirement, I think relative or absolute timelocks would work. But there is a worst case scenario for a router where an HTLC fails, both in and out commitment transactions go to chain. The router expects to wait out the timelock on the HTLC output for the out commitment transaction. But what if the receiver somehow gets the preimage and spends the UTXO first *and this happens right as the sender’s timelock expires, so they pull that UTXO. The router loses funds, they paid but were never paid. The router didn’t get a chance to pull funds. Relative timelocks depend on when the transactions are confirmed on the chain. But to keep the HTLC protocol trustless for the router, the transaction outputs must depend on each other. This ensures the router has a window of time to either #1 pull the in commitment output if the preimage is exposed at any point or #2 pull back the out commitment output when the timelock expires. The timelock for the out commitment output must be before the in commitment output. A relative timelock does not guarantee this whereas an absolute timelock does.
Both scenarios are about ensuring there is a window of at least X blocks for a counter-party to act, but the complexities of the protocol require different implementations. For the relative balance outputs, the window implicitly begins when the transaction is confirmed and ends with the CSV settings. For the absolute HTLC outputs both sides are explicit, the window begins with the CLTV setting for the out commitment transactions and ends with the CLTV setting for the in commitment transaction.
liquidity
Opening a channel requires a bitcoin transaction. Closing a channel requires another bitcoin transaction. This encourages long-lived channels to avoid the transaction fee overhead. Outbound (local) is owned by node operator. Inbound (remote) is owned by the counter-party.
There are two fee settings.
- Base Fee – Fixed fee charged each time a payment is routed through the channel.
- Fee Rate – A percentage fee charged on the value of the payment (liquidity fee).
amount * feerate / 1000000
.
The fee rate is different than on chain where the cost is MB instead of the value being sent. In the LN, liquidity is valuable so the fee is actually based on the value. If this was 0, one huge transaction could dry up a channel and it would only earn the base fee. It probably wouldn’t cover its opening/closing on chain costs.
As a routing node (not a wallet or service provider), inbound liquidity is a zero-sum game. Inbound liquidity is transferred from one channel to another when a transaction is forwarded through the routing node. There are only a handful of ways for an operator to gain inbound liquidity for their routing node.
- A peer opens a channel to the routing node.
- The operator makes a purchase through the node.
- Purchase inbound liquidity through a Loop or Pool.
Option 1 is the best case scenario. Its free for the routing node operator (the user opening the channel pays the onchain cost) and is 100% inbound liquidity. Problem is, users don’t want to open a channel to a new routing node because that is risky for them. A routing node must gain some initial respect (bootstrap) by opening channels, increasing its capacity, and up-time (all hard earned and require capital). An operator could use a liquidity group of other operators to establish inbound to their nodes, but this requires a good amount of trust.
Option 2 is nice because operator gets inbound liquidity plus whatever they actually bought on through the lightning network. Only problem is the operator would need to be buying some expensive things to get the inbound necessary. Doesn’t hurt though.
Option 3 is a one time payment for inbound liquidity through a Loop (off-chain on-chain) transaction. Theoretically, if an operator was then savvy with channel fees to keep things balanced afterward, the payment could be paid off through transaction fees and channels remain balanced.
A circular rebalance is sending a payment through the network that starts and ends at the same node. The goal is to balance the liquidity on a node. Nice that it requires no onchain transactions. But could get expensive depending on the amount of liquidity to balance (fee rates) plus number of hops (unless controlled with a predetermined route).
Circular rebalancing is a bit a dark art because you don’t want to pay too much fees, but you are attempting to shift a large amount of liquidity compared to normal transactions.
- Pick the nodes you want to shift liquidity between,
out == pubkey of too much outbound
andin == pubkey of too little outbound
. - Choose the amount of sats to shift.
- Start guessing at max fee rates.
Running Bitcoin
Some quick tips for running a bitcoin node.
networks
Signet is similar in nature to testnet, but more reliable and centrally controlled. The bitcoin daemon process can be pointed at a signet. signet=1
by itself will point at the “default” signet. Custom signets need a signetchallenge
and some server in addnode
signet=1
[signet]
signetchallenge=something
addnode=address
bitcoin.conf
You can then check it’s all hooked up with something like bitcoin-cli -signet getblockcount
.
remote connection
If connecting to a bitcoind
instance from a different computer, local configuration can be placed in ~/.bitcoin/bitcoin.conf
. I think at a minimum, a rpcconnect
, rpcuser
, and rpcpassword
are required.
watch-only wallet
I have only created watch-only wallets with the new descriptors syntax. There are older encodings, like ZPUB
and YPUB
, but I find them to be a confusing blend of private material generation stuff (HD wallets) and the type of outputs to scan for with those keys. So just sticking to the new things which have a nice dividing line.
Even further, I only tried this for P2TR
outputs, so a descriptor that looks like tr(<XPUB>/<INDEX>)
.
bitcoin-cli
’s createwallet
command has a bunch of flags, but I think only the disable_private_keys
one needs to be flipped to make it watch-only.
bitcoin-cli -named createwallet wallet_name=watch disable_private_keys=true load_on_startup=true
Create a watch-only wallet called “watch”. The -named
flag makes it a little easier to parse the args.
Then you can load up a descriptor into the wallet with importdescriptors
. Might have to use the getdescriptorinfo
first to get the checksum on the descriptor.
bitcoin-cli -rpcwallet=watch importdescriptors '[{ "desc": "tr(...)", "timestamp":1707933600 }, { "desc": "tr(...)", "timestamp":1707933600, "internal": true }]'
Importing a descriptor and a related change descriptor.
I went on a bit of journey with the hardened syntax being with h
or '
, and it looks like it causes others pain as well. Going to just stick the h
in the future. Not sure still why getdescriptorinfo
appears to return different checksums…