Cryptography

// Bringin math to social problems

Do not try and bend the spoon. That's impossible. Instead... only try to realize the truth.

Encryption and Hashing

At the bottom (or as deep as I am willing to go right now) we have encryption and hashing. Both of these patterns use cryptography to provide some guarantees which are simple to understand, even if how they are being guaranteed is a little on the mind-bending side. These guarantees might seem just neat at first, but when combined, create tools which are used everywhere today and have endless potential.

Encryption is a mapping of any piece of data and a key to another piece of data: data + key => other_data. But encryption provides some interesting guarantees (backed by the maths) about this relationship.

+-------------+   +---------------+
|             |   |               |
| data1+key---+---+-->other_data1 |
|             |   |               |
|             |   |               |
|  data2+key--+---+-->other_data2 |
|             |   |               |
|             |   |               |
| data3+key---+---+-->other_data3 |
|             |   |               |
|             |   |               |
|  data4+key--+---+-->other_data4 |
|             |   |               |
+-------------+   +---------------+

Encryption is a one-to-one relationship.

  1. It is impossible to calculate the original data from other_data without the key.
  2. It is impractical to even try guessing (would take mind-bending amounts of time).
  3. Changing the input data or key just a little, results in a completely different output, so knowing data doesn’t tell you anything about other_data.

So encryption is a practical way to scramble data.

Hashing is a mapping of any piece of data to a much smaller (fixed sized, really tiny) piece of data: data => small_data. Hashing has similar guarantees to encryption, with a twist of its own.

+----------+     +----------------+
|          |     |                |
|  data1---+--+--+-->small_data1  |
|          |  |  |                |
|          |  |  |                |
|  data2---+--+  |                |
|          |     |                |
|          |     |                |
|   data3--+-----+-->small_data2  |
|          |     |                |
|          |     |                |
| data4----+-----+--->small_data3 |
|          |     |                |
+----------+     +----------------+

Hashing is a many-to-one relationship.

  1. Like encryption, it’s impossible to calculate data from small_data.
  2. Also impractical to even try guessing.
  3. Changing the input data just a little also results in a completely different output, so knowing data doesn’t tell you anything about small_data.

What is the twist? That many-to-one relation implies that different input data maps to same output. A collision! But, and we have to trust the mind-bending maths again here, the chances of a collision are just so, so, so, SO small…we just don’t worry about it.

So hashing is a practical way to identify data.

Another use case for hashing is called a commitment. It involves at least two parties and an order of operations, protocol of sorts. If someone wants to prove they know something to another person, without revealing that something, they could instead reveal the hash of that something. Now, the hash by itself proves nothing at this point. But if at a later point they also reveal original something, called the preimage, all other parties can hash it themselves and check if it matches. If the hashes match, all parties know that the original party knew the preimage when they first gave the hash. It was a commitment to the preimage without revealing the preimage. The only other option is that the person guessed the hash which is extremely unlikely.

symmetric and asymmetric

Encryption comes in two forms: symmetric and asymmetric. These have to do with the type of key used in the mapping.

Symmetric is when the same key is used to encrypt and decrypt the data. This is straightforward to understand, but limits the use cases. One of the obvious use cases of encryption is to transfer date without others being able to read it. The data is safe to transfer since it’s scrambled, but how do you transfer the key?

Enter asymmetric encryption. Asymmetric encryption uses more of that mind-bending maths to create two keys which have a nifty relationship: encryption with one can only be decrypted with the other. The “how do you transfer the key?” question can now be answered:

  1. The data recipient (let’s call her Alice) sends her first key (let’s call it her public key) to the data sender (let’s call him Bob).
  2. Bob encrypts the data with Alice’s public key and sends her the encrypted data.
  3. Alice decrypts the data using the second key in her key pair (let’s call that her private key).

signatures

Digital signatures compose hashing and encryption. What is the use case? Data shooting across the internet goes through a lot of intermediate servers and routers which are not controlled by the data sender or data receiver. It is possible that the intermediate servers could tamper with some data or pretend to be a sender. A data receiver wants to verify who sent data and what data did they send? The data needs a signature.

The signature pattern begins with a small chunk of data sent alongside the original data. A message authentication code (MAC). How does this chunk of data allow a receiver to verify who and what?

  1. The data sender uses hashing to create a small identifier for the data being sent.
  2. The data sender encrypts the small identifier (this is a Hashed MAC or HMAC).
  3. The data sender sends the HMAC along with the data to the data receiver.
  4. The data receiver decrypts the HMAC and hashes the data.
  5. If the two values match they know who sent the data and that it hasn’t been tampered with.

If the sender and receiver are the same person (e.g. maybe sending a data to a different storage) symmetric encryption can be used since the key never needs to be sent. Asymmetric encryption is used if the sender and receiver are different people.

+------+
| Data |
+--+---+
   |
   | Hash
   |
   v
 +----+          +---------+
 | ID +--------->|Signature|
 +----+          +---------+
        Encrypt

Generate HMAC signature.

+------+    +---------+
| Data |    |Signature|
+--+---+    +----+----+
   |             |
   | Hash        | Decrypt
   |             |
   v             v
 +----+        +----+
 | ID |   ==   | ID |
 +----+        +----+

Validate data with signature.

Prime Numbers

Prime numbers show up a bunch in cryptographic implementations. But why?

A prime number’s only factors are itself and 1. This seems kinda random, but this characteristic has made it a building block for a ton of math, including cryptography things. Prime’s are “building blocks” because they cannot be decomposed into smaller numbers like composites. For example, the composite number 12 can be broken down into the primes 2*2*3. This breakdown is unique, there is only one set of primes a composite breaks down to, and this is so important it is called the Fundamental Theorem of Arithmetic!

If two prime numbers are multiplied together, the result can only be factored by 1, the two primes, or the result itself. This is called a semiprime.

A ton of research has been poured in to figuring out how to quickly find the factors of a number. Does an algorithm exist which is more efficient than just guessing numbers which could possibly be factors and checking real quick. The answer at the moment appears to be “no”, but we also haven’t proven that the answer is no. This “one way” function aspect though is the crux of a lot of cryptography. If you take two prime numbers and multiply them to get a semiprime, you now have a number which you know only has 4 factors. And you know the two (the original primes) which would be really hard to calculate.

If two numbers have no prime factors in common, they are described as relatively prime (or coprime or mutually prime). This means their greatest common denominator is 1. Obviously if either of the two numbers is prime, it goes a long way for them to be relatively prime.

modulus

In cryptography, prime numbers are often combo’d with the modulus binary operation. The modulus operation is dealing with the remainder of what is left after dividing n by m.

x mod y = x - y*floor(x/y)

Helpful way to view the mod binary operation.

Seeing how the modulus relates to division, and primes can’t be divided, you can start to get a feel for why these two things may be a potent combo.

The combo is often seen “modulo over a prime” in order to define finite fields. A field is a set which has two binary operators, commonly called “addition” and “multiplication”. These operators have to follow a bunch of rules for the field to be considered a field, one of which is that the operation is “closed”. This means the operations takes values from the set as input and outputs a value from the same set. Addition is pretty straight forward, but multiplication over a modulo can cause some issues here. If the modulo is over a composite, not a prime, than the output set is smaller than the input set. This is easiest to see with a multiplication table.

// Over a prime.
  | 0 1 2 3 4
--+----------
0 | 0 0 0 0 0
1 | 0 1 2 3 4
2 | 0 2 4 1 3
3 | 0 3 1 4 2
4 | 0 4 3 2 1

// Over a composite.
  | 0 1 2 3
--+--------
0 | 0 0 0 0
1 | 0 1 2 3
2 | 0 2 0 2
3 | 0 3 2 1   

Multiplication over a composite modulo shrinks the set, so it isn’t closed, so it is not a field.

If a field is over a prime p, every member of that field is relatively prime with p. Another way to describe multiplication is “repeatedly sum up a value x, y times, starting at 0”. Taking a look at the top grid above, x and y are the two axis, and for a fixed x column (or row, doesn’t really matter how you view it) you can see every member of the set. In other words y*x is different for every y. If a member showed up twice in a column, that would mean that x and m would have to have another factor.

So if x and m are relatively prime, there exists only one y such that y*x = 1 (mod m). These are multiplicative inverses since they cancel each other out: (y*x*z) % m == 1*z % m == z % m.

finite fields

Cryptography loves finite fields! They show up pretty much everywhere.

It is also common to layer groups and fields. In elliptic curve math, the “top” layer is the set of points on the curve and addition. Those are across the “bottom” layer of a finite field of integers modulus a prime number. Another example is the polynomial ring layer over the super nifty modulus two, GF(2), field used in AES’s Rijndael cipher.

That polynomial ring layer is a weird one, but I think super powerful for a lot of reasons (many of which I don’t know). One that is clear in the cryptography domain though, is the addition operation over the field is just a bitwise XOR. So, very efficient for CPUs to calculate.

XOR

The exclusive OR operator shows up a bunch in cryptography. It has a few characteristics which make it very useful in this context. It is also simple to implement and cheap.

    0 1
--+----
0 | 0 1
1 | 1 0 

XOR table, notice the balance of 1s and 0s.

A XOR 0 = A
A XOR A = 0

A XOR B = B XOR A

(A XOR B) XOR B = A 

Fun XOR things. That last one is extra intriguing.

While XOR by itself is not super secure, you can combo it with other strategies to easily mask and un-mask data leveraging the two properties A XOR 0 = A and A XOR A = 0. Algebraicly, this means XOR’ing twice by the same value leaves you with the original value.

Linearity

If all parts of a cipher are linear, it makes it much easier to attack. An attacker could take ciphertext and easily remove huge chunks of possible plaintexts. This is how many efficient algorithms operate (e.g. gaussian elimination or binary search), but must not be a characteristic of a cipher. Ideally, all plaintexts have equal possibility of being a ciphertext. What we want is a bent function.

What is linear depends on a field’s notion of addition and multiplication. A linear relationship has the shape f(t) = c + g(t) where c is a constant and g has the property that g(a + b) = g(a) + g(b), distributes over addition. So over real numbers, g(x) can only be scalar multiplication (a.k.a. multiply by a constant). Variables are never multiplied by each other – they are only ever multiplied by constants in a linear function. This means the rate of change is a constant. At any point, the rate of change is the same. Compare that to exponential or even polynomials. With linear, you only need two points to know the whole line. They are simple relationships. There are infinite non-linear relationships, but only a subset are good for cryptographic purposes, and they often have characteristics that are the opposite of linear functions which maybe explains the callout on linear functions.

Linearity can be measured by approximating a function with the “best” linear version (I would guess this gets hard to find with higher complexity functions) of the function. The more overlap with same input/outputs with the approximate linear function, the worse the function is at being non-linear.

Linear functions of different spaces can be combined to form a non-linear function in both. This is a little tricky. If F is over a field, but has an internal step which happens to be linear over another field, that doesn’t much matter to F. Maybe the part which is a real bender is interpreting a value in one field and then in another mid-function? This is a tactic used to break up linearity in the ARX-based rounds, where there is addition over a finite field of 2^N, where N is common register size like 32 or 64, and that is combo’d with an XOR operation, which is addition over a polynomial ring GF(2^N). The numbers in both these fields are 1:1, even though the operations are not related. So I don’t think there is any *-morphisim between the fields, but it is still possible to flip back and forth between the operation outputs.

Forward and Backward Secrecy

A lot of encryption protocols between two parties depend on some sort of shared secret which is used in a scheme to encrypt and decrypt data. The schemes can be described by how well they handle the secret key being stolen. Does that mean all data ever sent is now public? Or just new data? Or neither?

A scheme can have Forward Secrecy, Backward Secrecy, or both! But since it involves time the names are a bit confusing. In any case though, these traits reduce the motivation for attackers since they can’t decode some ciphertext even if they have the secret keys. The cost is the complexity of the protocol goes up.

A scheme has Forward Secrecy (a.k.a. perfect forward secrecy) if past communications are protected from future key compromises. A scheme has Backwards Secrecy (a.k.a. future secrecy) if future communications are protected against past key compromises. So the common names, forward and backwards, are from the key compromise perspective not the communication. So if you are communication with a buddy, forward keys can’t hack you and backwards keys can’t hack you.

So how are these implemented? If there is just one secret key for the whole lifetime of a “channel” between two parties, it obviously doesn’t have either forward or backwards secrecy. Everything is compromised. More keys need to get involved!

    coms1  coms2  coms3  coms4  coms6  coms6
------|------|------|------|------|------|------> 
     key1   key2   key3   key4   key5   key6 

A timeline of communications and the key used at that point.

A way to achieve forward secrecy is to generate a new key for every communication. If the keys are totally unrelated, then getting key5 doesn’t help an attacker decrypt coms3. This can be implemented with both parties performing a Diffie-Hellman per communication. It is a lot of overhead, but possible and usually at least part of any scheme. Since the keys are totally unrelated, backwards secrecy is also achieved. key2 also cannot help an attacker with coms3.

A DH exchange per communication is a whole ton of each party having to pick random numbers and pass then perform the material exchange handshake. That is high costs in computation, latency, and bandwidth. Not great! What if we perform just one DH exchange at the beginning of a channel, and then each party derives keys independently without all the handshaking? You could do something as simple as key2 is the hash of key1. This drops the overhead a ton, and keeps forward secrecy since hashes are one-way functions, knowing key2 doesn’t get you any closer to knowing key1. But if an attacker gets key1, they can derive all the following keys, so backwards secrecy is gone.

Another layer needs to be mixed in to maintain backwards secrecy, but also low overhead. The Key Derivation Function (KDF) needs another input other than the key before. So key_n+1 = KDF(key_n || $SOMETHING), but what something? Protocols can agree on all sorts of things, but a common one is a message counter. That way an attacker needs to not only know a key, but also the current counter between the parties in order to derive the next key. The more inputs added the harder for an attacker to crack the backwards secrecy.

Constant and Variable Time

When we talk constant time in Big O notation, we are saying that a function takes the same amount of time to run regardless of input size. But Big O is generally focused on performance. For cryptographic implementations, performance is very important since users will just skip using things like encryption if it is too expensive. But just as important is security.

For a function to be secure it cannot leak any information about secret materials. This includes side channel attacks where an attack might gain information in weird ways, like measuring how long it takes running a function for all sorts of inputs. The goal is for a function to run in constant time no matter the input. So this includes inputs of different size, but also any other characteristic of the input.

One might think, well, the secret material of cryptographic implementation is usually a fixed length (e.g. a private key) does that protect us? It does limits a lot of variable runtime issues. A for loop over each element will be constant time. But that shouldn’t give you too much confidence because there are a whole ton of other things which can leak information which are not dependent on input length at all. For example, something as simple as an early return statement. Or a comparison. Or and indexing operations. All of these will run differently given different inputs.

There are some patterns which can help out. Like avoiding branching if-else statements and instead use bitwise operations. This is a general trend where the code gets less performant and harder to read, but is now constant time no matter the input.

fn constant_time_select(condition: bool, a: u32, b: u32) -> u32 {
    // Convert bool to u32 (0 or 1).
    let mask = condition as u32;
    
    // Create a mask that's all 1s if condition is true, all 0s if false.
    let mask = mask.wrapping_neg();
    
    // Use the mask to select either a or b.
    (mask & a) | (!mask & b)
}

fn main() {
    let secret = true;
    let x = 42;
    let y = 100;
    
    // Non-constant time (vulnerable) version.
    let result_vulnerable = if secret { x } else { y };
    
    // Constant time version.
    let result_constant = constant_time_select(secret, x, y);
    
    println!("Vulnerable: {}", result_vulnerable);
    println!("Constant-time: {}", result_constant);
}
  

Constant time selection.