Cryptography

// Bringin math to social problems

Do not try and bend the spoon. That's impossible. Instead... only try to realize the truth.

How can we model things digitally which we take for granted in the physical world?

Encryption and Hashing

Encryption and hashing use cryptography to provide some guarantees which are simple to understand, even if how they are being guaranteed is a little on the mind-bending side. These guarantees might seem just neat at first, but when combined, create tools which are used everywhere today and have endless potential.

Encryption is a mapping of any piece of data and a key to another piece of data: data + key => other_data. But encryption provides some interesting guarantees (backed by the maths) about this relationship.

+-------------+   +---------------+
|             |   |               |
| data1+key---+---+-->other_data1 |
|             |   |               |
|             |   |               |
|  data2+key--+---+-->other_data2 |
|             |   |               |
|             |   |               |
| data3+key---+---+-->other_data3 |
|             |   |               |
|             |   |               |
|  data4+key--+---+-->other_data4 |
|             |   |               |
+-------------+   +---------------+

Encryption is a one-to-one relationship.

It is impossible to calculate the original data from other_data without the key.
It is impractical to even try guessing (would take mind-bending amounts of time).
Changing the input data or key just a little, results in a completely different output, so knowing data doesn’t tell you anything about other_data.

So encryption is a practical way to scramble data.

Hashing is a mapping of any piece of data to a much smaller (fixed sized, really tiny) piece of data: data => small_data. Hashing has similar guarantees to encryption, with a twist of its own.

+----------+     +----------------+
|          |     |                |
|  data1---+--+--+-->small_data1  |
|          |  |  |                |
|          |  |  |                |
|  data2---+--+  |                |
|          |     |                |
|          |     |                |
|   data3--+-----+-->small_data2  |
|          |     |                |
|          |     |                |
| data4----+-----+--->small_data3 |
|          |     |                |
+----------+     +----------------+

Hashing is a many-to-one relationship.

Like encryption, it’s impossible to calculate data from small_data.
Also impractical to even try guessing.
Changing the input data just a little also results in a completely different output, so knowing data doesn’t tell you anything about small_data.

What is the twist? That many-to-one relation implies that different input data maps to same output. A collision! But, and we have to trust the mind-bending maths again here, the chances of a collision are just so, so, so, SO small…we just don’t worry about it.

So hashing is a practical way to identify data.

Another use case for hashing is called a commitment. It involves at least two parties and an order of operations, protocol of sorts. If someone wants to prove they know something to another person, without revealing that something, they could instead reveal the hash of that something. Now, the hash by itself proves nothing at this point. But if at a later point they also reveal original something, called the preimage, all other parties can hash it themselves and check if it matches. If the hashes match, all parties know that the original party knew the preimage when they first gave the hash. It was a commitment to the preimage without revealing the preimage. The only other option is that the person guessed the hash which is extremely unlikely.

symmetric and asymmetric

Encryption comes in two forms: symmetric and asymmetric. These have to do with the type of key used in the mapping.

Symmetric is when the same key is used to encrypt and decrypt the data. This is straightforward to understand, but limits the use cases. One of the obvious use cases of encryption is to transfer date without others being able to read it. The data is safe to transfer since it’s scrambled, but how do you transfer the key?

Enter asymmetric encryption. Asymmetric encryption uses more of that mind-bending maths to create two keys which have a nifty relationship: encryption with one can only be decrypted with the other. The “how do you transfer the key?” question can now be answered:

The data recipient (let’s call her Alice) sends her first key (let’s call it her public key) to the data sender (let’s call him Bob).
Bob encrypts the data with Alice’s public key and sends her the encrypted data.
Alice decrypts the data using the second key in her key pair (let’s call that her private key).

merkle trees

A merkle tree, a tree of hashes, is a data structure which leverages the commitment characteristic of hashes offering some interesting trade-offs. The structure is straightforward.

A tree of hashes based on four data elements.

In the example above, there are four chunks of data d1, d2, d3, d4. Each chunk is hashed with a cryptographic hash function, like SHA-256, to get the first “level” of hashes h1, h2, h3, h4. If using SHA-256 each hash is 256 bits as the name implies also known as 32 bytes. Now the fun tree part begins. If the tree is structured like a binary tree, as it is above, sibling nodes are hashed together to get a parent node. This operation is continued to the top until we have one root hash. The root hash, like all the other hashes in the tree, is 32 bytes of data.

The root hash is a commitment to all the data leaves of the tree. If at some point in time you know this hash, it means you know the set of data elements. This allows parties to exchange knowledge of a set without having to necessarily send the whole set of data, but instead just 32 bytes (in the case of SHA-256).

One use case is to validate data. If a user wants to check with another that their data sets match, they can send over a merkle tree root. The validator can then calculate the root based on their set of data and check that it matches. Any change in the data, no matter the size, results in a different root hash.

Cool, but why the tree structure then? The same could be accomplished by just concatenating the data elements and hashing them saving on hash calculations. The power of the tree becomes clear when looking at fewer elements in the set.

Let’s say Alice has the root hash, which she trusts, and Bob wants to prove to her that d3 is in the tree. Bob wants to prove membership. He could send Alice all four data elements and let Alice do all the hashing, but there is a more efficient way. What if he instead sends d3 along with h4 and h5. Alice can hash d3 to get h3, and then hash that with the given h4 to get h6. She can then hash that with the given h5 to get the root h7. As long as that matches the root that Alice has, Bob has effectively proved that d3 was committed. The proof that Bob provided is referred to as the merkle path, all the node hashes required to get from a data element to the root hash.

Path is kind of a confusing term since the “path” to d3 from the root is h6 to h3 to d3. But that path is totally useless to Alice the validator because she is not able to use them to prove d3 is committed by the root. They just look like random 32 byte blobs to her. Instead, the merkle path is made up of the hash siblings. The siblings effectively commit to subsections of the tree, which cover more and more elements as you go up.

Only four data elements doesn’t really show off the log power of the tree. The proof required two 32 byte siblings, so 64 bytes total. Depending on the size of the data elements, that might not even be smaller than just sending all four data elements! But you start pouring more data into that tree and the proof only grows with each new “level” (a.k.a log_2(n)). 8 elements could fit into a tree with one more layer, so three proofs. 16 elements to 4 proofs, 32 to 5, 64 to 6, 128 to 7, 256 to 8, 512 to 9, 1024 to 10…and so on.

You could use a tree with a higher branching factor than the two of a binary tree, but the tradeoff doesn’t produce a win. The height of the tree goes down as the branching factor goes up, log_x(elements). But in a binary tree, just one hash represents all the siblings to compute a parent node (only 2 nodes per level). A branching factor of three, a ternary tree, requires two other siblings to compute a parent. So the proof size won’t necessarily be smaller due to the smaller height of the overall tree because more sibling hashes are required. Since the height decreases logarithmically with the branching factor, but the number of siblings increases linearly, it actually doesn’t work out at all.

One interesting requirement for merkle trees is a determinitic way to order data elements. Without such an agreed upon definition, different root hashes would be calculated with the same set of data elements.

Schemes

Cryptography primitives are layered together in schemes to provide some high level operation. Schemes have some common desired characteristics.

forward and backward secrecy

A lot of encryption protocols between two parties depend on some sort of shared secret which is used in a scheme to encrypt and decrypt data. The schemes can be described by how well they handle the secret key being stolen. Does that mean all data ever sent is now public? Or just new data? Or neither?

A scheme can have Forward Secrecy, Backward Secrecy, or both! But since it involves time the names are a bit confusing. In any case though, these traits reduce the motivation for attackers since they can’t decode some ciphertext even if they have the secret keys. The cost is the complexity of the protocol goes up.

A scheme has Forward Secrecy (a.k.a. perfect forward secrecy) if past communications are protected from future key compromises. A scheme has Backwards Secrecy (a.k.a. future secrecy) if future communications are protected against past key compromises. So the common names, forward and backwards, are from the key compromise perspective not the communication. So if you are communication with a buddy, forward keys can’t hack you and backwards keys can’t hack you.

So how are these implemented? If there is just one secret key for the whole lifetime of a “channel” between two parties, it obviously doesn’t have either forward or backwards secrecy. Everything is compromised. More keys need to get involved!

    coms1  coms2  coms3  coms4  coms6  coms6
------|------|------|------|------|------|------> 
     key1   key2   key3   key4   key5   key6

A timeline of communications and the key used at that point.

A way to achieve forward secrecy is to generate a new key for every communication. If the keys are totally unrelated, then getting key5 doesn’t help an attacker decrypt coms3. This can be implemented with both parties performing a Diffie-Hellman per communication. It is a lot of overhead, but possible and usually at least part of any scheme. Since the keys are totally unrelated, backwards secrecy is also achieved. key2 also cannot help an attacker with coms3.

A DH exchange per communication is a whole ton of each party having to pick random numbers and pass then perform the material exchange handshake. That is high costs in computation, latency, and bandwidth. Not great! What if we perform just one DH exchange at the beginning of a channel, and then each party derives keys independently without all the handshaking? You could do something as simple as key2 is the hash of key1. This drops the overhead a ton, and keeps forward secrecy since hashes are one-way functions, knowing key2 doesn’t get you any closer to knowing key1. But if an attacker gets key1, they can derive all the following keys, so backwards secrecy is gone.

Another layer needs to be mixed into maintain backwards secrecy, but also low overhead. The Key Derivation Function (KDF) needs another input other than the key before. So key_n+1 = KDF(key_n || $SOMETHING), but what something? Protocols can agree on all sorts of things, but a common one is a message counter. That way an attacker needs to not only know a key, but also the current counter between the parties in order to derive the next key. The more inputs added the harder for an attacker to crack the backwards secrecy.

constant and variable time

When we talk constant time in Big O notation, we are saying that a function takes the same amount of time to run regardless of input size. But Big O is generally focused on performance. For cryptographic implementations, performance is very important since users will just skip using things like encryption if it is too expensive. But just as important is security.

For a function to be secure it cannot leak any information about secret materials. This includes side channel attacks where an attack might gain information in weird ways, like measuring how long it takes running a function for all sorts of inputs. The goal is for a function to run in constant time no matter the input. So this includes inputs of different size, but also any other characteristic of the input.

One might think, well, the secret material of cryptographic implementation is usually a fixed length (e.g. a private key) does that protect us? It does limits a lot of variable runtime issues. A for loop over each element will be constant time. But that shouldn’t give you too much confidence because there are a whole ton of other things which can leak information which are not dependent on input length at all. For example, something as simple as an early return statement. Or a comparison. Or and indexing operations. All of these will run differently given different inputs.

“Constant time” is a misnomer, it is “memory access and code branching does not depend on secret information”.
SIPA

There are some patterns which can help out. Like avoiding branching if-else statements and instead use bitwise operations. This is a general trend where the code gets less performant and harder to read, but is now constant time no matter the input.

fn constant_time_select(condition: bool, a: u32, b: u32) -> u32 {
    // Convert bool to u32 (0 or 1).
    let mask = condition as u32;
    
    // Create a mask that's all 1s if condition is true, all 0s if false.
    let mask = mask.wrapping_neg();
    
    // Use the mask to select either a or b.
    (mask & a) | (!mask & b)
}

fn main() {
    let secret = true;
    let x = 42;
    let y = 100;
    
    // Non-constant time (vulnerable) version.
    let result_vulnerable = if secret { x } else { y };
    
    // Constant time version.
    let result_constant = constant_time_select(secret, x, y);
    
    println!("Vulnerable: {}", result_vulnerable);
    println!("Constant-time: {}", result_constant);
}

Constant time selection.

authentication

The term authentication is used to describe a handful of properties, which is confusing. Maybe not so surprising though since “Is this thing authentic?” is a question that comes up a lot in cryptographic protocols.

Entity Authentication

Confirm the identity of a counter-party. In SSH for example, this is done with public keys. There is an assumption in the trust model that a user already knows the public key of someone. Some sort of out-of-band communication is require for any entity authentication.

Message Integrity Authentication

Verify a message hasn’t been tampered with. Like a checksum, but can’t be forged (cryptographically secure).

Message Origin Authentication

Proving who sent a message. Slightly different than #1.

Protocol Authentication

Proving a protocol was executed. Used in zero-knowledge protocols.

There may be more! But I think you get the point. The BIP324 protocol calls out that it does not require authentication, and in that case, we are talking about entity authentication. The protocol actually involves both message integrity and origin authentication through the use of the Poly1305 message authentication (there it is again…) code (MAC).

Finite Fields

Cryptography loves finite fields! They show up pretty much everywhere.

It is also common to layer groups and fields. In elliptic curve math, the “top” layer is the set of points on the curve and addition. Those are across the “bottom” layer of a finite field of integers modulus a prime number. Another example is the polynomial ring layer over the super nifty modulus two, GF(2), field used in AES’s Rijndael cipher.

That polynomial ring layer is a weird one, but I think super powerful for a lot of reasons (many of which I don’t know). One that is clear in the cryptography domain though, is the addition operation over the field is just a bitwise XOR. So, very efficient for CPUs to calculate.

prime numbers

Prime numbers show up a bunch in cryptographic implementations. But why?

A prime number’s only factors are itself and 1. This seems kinda random, but this characteristic has made it a building block for a ton of math, including cryptography things. Prime’s are “building blocks” because they cannot be decomposed into smaller numbers like composites. For example, the composite number 12 can be broken down into the primes 2*2*3. This breakdown is unique, there is only one set of primes a composite breaks down to, and this is so important it is called the Fundamental Theorem of Arithmetic!

If two prime numbers are multiplied together, the result can only be factored by 1, the two primes, or the result itself. This is called a semiprime.

A ton of research has been poured in to figuring out how to quickly find the factors of a number. Does an algorithm exist which is more efficient than just guessing numbers which could possibly be factors and checking real quick. The answer at the moment appears to be “no”, but we also haven’t proven that the answer is no. This “one way” function aspect though is the crux of a lot of cryptography. If you take two prime numbers and multiply them to get a semiprime, you now have a number which you know only has 4 factors. And you know the two (the original primes) which would be really hard to calculate.

If two numbers have no prime factors in common, they are described as relatively prime (or coprime or mutually prime). This means their greatest common denominator is 1. Obviously if either of the two numbers is prime, it goes a long way for them to be relatively prime.

modulus

In cryptography, prime numbers are often combo’d with the modulus binary operation. The modulus operation is dealing with the remainder of what is left after dividing n by m.

x mod y = x - y*floor(x/y)

Helpful way to view the mod binary operation.

Seeing how the modulus relates to division, and primes can’t be divided, you can start to get a feel for why these two things may be a potent combo.

The combo is often seen “modulo over a prime” in order to define finite fields. A field is a set which has two binary operators, commonly called “addition” and “multiplication”. These operators have to follow a bunch of rules for the field to be considered a field, one of which is that the operation is “closed”. This means the operations takes values from the set as input and outputs a value from the same set. Addition is pretty straight forward, but multiplication over a modulo can cause some issues here. If the modulo is over a composite, not a prime, than the output set is smaller than the input set. This is easiest to see with a multiplication table.

// Over a prime.
  | 0 1 2 3 4
--+----------
0 | 0 0 0 0 0
1 | 0 1 2 3 4
2 | 0 2 4 1 3
3 | 0 3 1 4 2
4 | 0 4 3 2 1

// Over a composite.
  | 0 1 2 3
--+--------
0 | 0 0 0 0
1 | 0 1 2 3
2 | 0 2 0 2
3 | 0 3 2 1

Multiplication over a composite modulo shrinks the set, so it isn’t closed, so it is not a field.

If a field is over a prime p, every member of that field is relatively prime with p. Another way to describe multiplication is “repeatedly sum up a value x, y times, starting at 0”. Taking a look at the top grid above, x and y are the two axis, and for a fixed x column (or row, doesn’t really matter how you view it) you can see every member of the set. In other words y*x is different for every y. If a member showed up twice in a column, that would mean that x and m would have to have another factor.

So if x and m are relatively prime, there exists only one y such that y*x = 1 (mod m). These are multiplicative inverses since they cancel each other out: (y*x*z) % m == 1*z % m == z % m.

xor

The exclusive OR operator has a few characteristics which make it very useful. It is also simple to implement and cheap.

    0 1
--+----
0 | 0 1
1 | 1 0

XOR table, notice the balance of 1s and 0s.

A XOR 0 = A
A XOR A = 0

A XOR B = B XOR A

(A XOR B) XOR B = A

Fun XOR things. That last one is extra intriguing.

While XOR by itself is not super secure, you can combo it with other strategies to easily mask and un-mask data leveraging the two properties A XOR 0 = A and A XOR A = 0. Algebraicly, this means XOR’ing twice by the same value leaves you with the original value.

one-way and trapdoor

The terms often get used interchangeably, but a trapdoor function is a special type of one-way function. It has an extra feature where it is possible to invert the function if you have a piece of secret information.

linearity

If all parts of a cipher are linear, it makes it much easier to attack. An attacker could take ciphertext and easily remove huge chunks of possible plaintexts. This is how many efficient algorithms operate (e.g. gaussian elimination or binary search), but must not be a characteristic of a cipher. Ideally, all plaintexts have equal possibility of being a ciphertext. What we want is a bent function.

What is linear depends on a field’s notion of addition and multiplication. A linear relationship has the shape f(t) = c + g(t) where c is a constant and g has the property that g(a + b) = g(a) + g(b), distributes over addition. So over real numbers, g(x) can only be scalar multiplication (a.k.a. multiply by a constant). Variables are never multiplied by each other – they are only ever multiplied by constants in a linear function. This means the rate of change is a constant. At any point, the rate of change is the same. Compare that to exponential or even polynomials. With linear, you only need two points to know the whole line. They are simple relationships. There are infinite non-linear relationships, but only a subset are good for cryptographic purposes, and they often have characteristics that are the opposite of linear functions which maybe explains the callout on linear functions.

Linearity can be measured by approximating a function with the “best” linear version (I would guess this gets hard to find with higher complexity functions) of the function. The more overlap with same input/outputs with the approximate linear function, the worse the function is at being non-linear.

Linear functions of different spaces can be combined to form a non-linear function in both. This is a little tricky. If F is over a field, but has an internal step which happens to be linear over another field, that doesn’t much matter to F. Maybe the part which is a real bender is interpreting a value in one field and then in another mid-function? This is a tactic used to break up linearity in the ARX-based rounds, where there is addition over a finite field of 2^N, where N is common register size like 32 or 64, and that is combo’d with an XOR operation, which is addition over a polynomial ring GF(2^N). The numbers in both these fields are 1:1, even though the operations are not related. So I don’t think there is any *-morphisim between the fields, but it is still possible to flip back and forth between the operation outputs.

https://github.com/LLFourn/secp256kfun • https://berry.win.tue.nl/CryptographicProtocols/LectureNotes.pdf • https://loup-vaillant.fr/tutorials/cofactor • https://elligator.org/