Point Encoding

// #Bitcoin

Elliptic curve points are comprised of an x and a y coordinate. The curve (group of constants) used in Bitcoin is secp256k1, which dictates that the x and y values are limited to 256-bits a.k.a. 32-bytes. Points show up in two spots on the blockchain: public keys and the R value in signatures. Bitcoin originally stored both the x and y coordinates of a point on the chain, taking up a total of 64-bytes. That can get expensive, so over the years, multiple encoding formats have been used to lower that heavy 64-byte cost. It has been a bit of a journey.

An elliptic curve over the set of real numbers has a symmetric shape which mirrors itself over the x-axis. This means for any x value we have two possible y values. In the simplest form of the curve, the one you often see in pictures, the “mirror” is the x-axis (determined by whatever constants the curve uses). This means that the two possible y values for a given x_1 would be y_1 and -y_1. They are the additive inverses of each other. This pattern can be leveraged to drop the number of bytes necessary to encode a point. Instead of storing a point as two 32-byte values, only the x coordinate is stored, and theoretically just one more bit could be used to tell the reader if the y coordinate is “above” or “below” the mirror. Bitcoin’s curve however, secp256k1y, isn’t over the set of real numbers, but a finite field where all the values are positive. The graph of secp256k1 doesn’t have as visually obvious of a “mirror” as the standard real numbers version, but luckily, we can still take advantage of a mirror to encode the points more efficiently.

The equation for secp256k1 is y^2 = x^3 + 7 (mod p) where p = 2^256 - 2^32 - 977 (a big odd prime number). If we solve for y, the equation on the right side is a square root. And due to properties of a quadratic residue in a finite field, there are two square roots just like we are used to in the set of real numbers. But, instead of each pair of roots being a positive and a negative, one is odd and one is even. So we still have a way to determine if a y value is “above” or “below” the mirror, we just encode if the y value is even or odd. This is the first form of compressed encoding Bitcoin adopted, compressed public keys, to drop the cost of writing a point from 64-bytes to 33-bytes. The first byte is where the odd or even flag is stored. 8-bits seems like a lot of space to store a single bit of data, and even weirder, the odd and even flags are 0x02 and 0x03. Turns out this was an existing well-defined standard, SEC 1: Elliptic Curve Cryptography (and that appears to be based on another standard?) SEC-1 uses that first byte to encode a lot of different things which are just not that relevant to the Bitcoin use case. 64 to 33 bytes is still a big win though, and using the existing standard is nice, but did leave a bit to be desired.

In the taproot soft-fork, BIP-340 introduced a new encoding format which drops the flag entirely. So 33 to 32 bytes, x-only encoding, by assuming that the y coordinate used is always the even one. This is extremely optimized for saving space on the blockchain, but introduced some issues as Bitcoin scripts got more complex. Limiting scope to just tapscript, what are the operations effected by this encoding?

The simplest operation is generating a public key. BIP-340 describes the general steps: get 32-bytes of random data, that is you private key, scalar-multiply it by the constant generator point G and you have your elliptic curve point public key. Now, how quickly should the point shed its odd or even y bit? If it does immediately and just represents itself as a 32-bytes, then technically there are two private keys which map to that public key. From a security perspective, this means very little due to the discrete log problem, but it does add some confusion for interface design, should an x-only key always be preferred? Let’s see for the other operations.

The signing operation is also defined in BIP-340 and has a few bells and whistles. The parts we are most interested in are the two curve points, the public key of the signer’s private key and the public nonce R. If a signer’s private key maps to a point with an odd y value, what do they do? Make a new key? Not that drastic, they just need to get the additive inverse of their private key, (p - x) (mod p). Both x (mod p) and (p - x) (mod p) map to the same x-only public key (a quadratic residue of the finite field), but the hash of the private key is used to determine R, so it is necessary to keep track of which root is being “used”. The private nonce k used to determine R uses the same pattern, its additive inverse is used if necessary, so s always uses the even-point-producing version of k. The BIP-340 signature verification protocol doesn’t allow for anything but x-only public keys, so there is no malleability, everything must be even-only.

This all seems fine so far, and I think if you live entirely in simple BIP-340 signature-land, it is just fine.

But then we get to BIP-341 which introduces the concept of taproot outputs. Taproot outputs, P2TR, are constructed by tweaking public keys with possible script spends. And tweaking is a general pattern used out-of-band to create many types of P2TR outputs. Tweaking is just point arithmetic, so there is no guarantee output points are always gonna be even-only. This isn’t a huge deal for the signature operation and its limited scope, but what if some output point involves adding more than two points together D = A + B + C? Are the intermediate states suppose to be inverted to even-only? A calculation will end up on a wildly different part of the curve if we assume so. BIP-341 lays down some rules for tapscripts, but it is up to other protocols to do the same as well which is unfortunate. Impossible to measure the total complexity this places on the ecosystem, but it may not be worth the one byte saved.

While we are here, what are the rules for BIP-341? Quick recap on taproot outputs, the taproot public output key is Q = P + tG where P is the internal key and t is the taptweak merkle tree root hash. A keypath spend would create a signature for Q and so would fall into the standard signature operation. Nothing too special there. What about a scriptpath spend? A scriptpath needs to prove that its script was committed too and then prove the script. Proving the commitment involves a data structure called the control block which has enough info to point to the script in the merkle tree. There is one bit in the control block called the parity bit (or sign bit) which records if Q has an odd or even y value. So P is assumed to be an even-only generated key, tG is only used at “runtime” so it always has its full definition, and Q can be deterministically checked with the parity bit. So the scriptpath proof can be completely checked. BIP-341 has a special note, Rationale #10, on the necessity of the parity bit. It does seem like it isn’t totally necessary, since there is only one arithmetic operation it is just like the signature operation where it narrows it down to just one of two values. But the note mentions that it is required for batch validation, where there are many arithmetic operations.

https://github.com/jonasnick/bips/issues/32#issuecomment-1177725159 • https://medium.com/blockstream/reducing-bitcoin-transaction-sizes-with-x-only-pubkeys-f86476af05d7 • https://bitcoin.stackexchange.com/questions/41662/on-public-keys-compression-why-an-even-or-odd-y-coordinate-corresponds-to-the-p • https://bitcoin.stackexchange.com/questions/120501/is-it-true-that-we-dont-care-about-parity-with-the-key-path-but-it-is-not-case • https://github.com/bitcoinops/taproot-workshop/blob/master/2.4-taptree.ipynb