# Co487 Assignments Definition

0 Comments / 1 / By

## CO487

Applied cryptography.

## 4/3/16

All course resources are on LEARN. Course has 5 assignments (summing up to 20% of final grade), midterm (worth 30%), and a final exam (worth 50%). Midterm is March 8, 2017 at 7-9pm.

Cryptography is a tool we can use to secure communications when there are malicious adversaries. Some of the fundamental goals of cyptography are:

• Confidentiality - data is secret to all people except authorized parties.
• Data Integrity - data is unaltered.
• Origin authentication - data can be confirmed to be from a particular source.
• Non-repudiation - once a statement is published, it is not possible to deny/revoke it later.

Different applications might require different subsets of these goals.

In this course, we use Bob and Alice as placeholders to represent two parties trying to communicate securely. We use Eve or Mallory as a placeholder to represent one or more adversaries trying to attack the communications (by injecting data, eavesdropping, or other approaches, depending on the threat model)

Some famous early practical uses of cryptography was the enigma machine, which used multiple spinning rotors to encrypt/decrypt messages. In that cryptosystem, the German military leadership and the German U-boats had the role of Bob and Alice, trying to communicate confidentially, and Alan Turing and his team had the role of Eve, trying to break the confidentiality guarantees and read those communications.

However, cryptosystems like Enigma and its successor, Lorenz, are a far cry from modern cryptosystems, which have a much more mathematically sound foundation. Modern cryptography is what makes online banking, online shopping, and cellular networks possible.

For example, SSL, the protocol that makes online security possible, ensures origin authentication and confidentiality between a user and a website. SSL uses symmetric-key encryption to implement confidentiality, and a MAC scheme (HMAC) to implement origin authentication. Of course, to make this possible they both need to have the same secret key for the symmetric-key cryptography, and for the MAC. To confidentially and authentically share this secret key, we use public-key encryption. Of course, to make this possible we need a way to obtain authentic copies of the public keys of both parties - this is implemented using digital signatures, where trusted third parties known as certificate authorities keep track of public keys that are considered authentic (generally by vetting the site owner in the real world), and digitally sign them to confirm that they're authentic. The certificate authorities' public keys are pre-installed in the browser, which acts as a root of trust - if you trust that certificate authority's public key is authentic, and that the certificate authorities are correctly keeping track of websites' authentic public keys, then you can also trust that the website's presented public key is also authentic.

SSL is one of the most successful cryptosystems ever deployed, used for tons of sites on the internet. However, there are a number of possible weaknesses that might result in the guarantees being broken:

• The crypotgraphy itself might be weak (e.g., previously, SSL supportied using RC4 for the symmetric-key encryption, which is nowadays easy to break).
• Quantum computers can attack public-key encryption systems like RSA.
• Random number generators used in the protocol might have weaknesses (e.g., Netscape's flawed RNG, NSA's backdoored EC-DRBG CSPRNG).
• Certificate authorities (e.g., social engineering resulting in Verisign digitally signing non-authentic certificates).
• Bugs in crpytographic code (e.g., Heartbleed)
• Misuse of cryptography, like encrypting the wrong thing.
• Phishing/social engineering attacks on users themselves.
• The server itself leaking data - SSL only protects data in transit, not when it's on the server.

This demonstrates some useful concepts:

• Symmetric-key encryption is used to ensure confidentiality.
• MAC schemes are used to ensure authentication.
• Public-key encryption is used to ensure confidentiality, integrity, authentication, and non-repudiation.
• Digital signatures is used to ensure integrity, authentication, and non-repudiation.

Cryptography is only one part of a large information security ecosystem, including other things like secure operating systems, auditing mechanisms, trusted computing, and risk analysis. In this context, cryptography provides a lot of useful and essential tools, but it's not all there is to security. Attackers will generally target the weakest part of the system, and if that link fails, it is possible that the entire system will.

This course focuses on breadth over depth. For depth, take CO 485 and try out the readings for this course as well.

## 6/1/17

SSL in more detail:

1. Client makes request to the website (not the full URL, just the host).
2. Server responds with the website's certificate, which contains the website identification info (like host and etc.).

## Symmetric-key encryption

A symmetric key encryption scheme (SKES) is a definition containing:

• A plaintext space M.
• A ciphertext space C.
• A key space K.
• A family of encryption functions E_k: M \to C, \forall k \in K.
• A family of decryption functions D_k: C \to M, \forall k \in K.

To use one of these to implement message confidentiality, Alice and Bob agree on a particular k \in K over a secure channel (so that nobody else knows about the value of k). Then, Alice can compute c = E_k(m), where m is the message, and then sends c to Bob over an unsecure channel. Bob can read the message by then computing m = D_k(c), while anyone without k would have to figure out the correct D_k function to use. An SKES can be designed such that finding the correct D_k would be very difficult.

A substitution cipher is an SKES, under this definition: M is the set of all English messages, C is the set of all encrypted messages, K is the set of all permutations of the English alphabet, E_k maps letters onto k by index, and D_k does the inverse mapping.

A security model defines what the adversay is capable of, like what they can do to the communicating parties. For example, there are passive attacks like ciphertext attacks (attacker can obtain ciphertext, like if they're listening on the same network), known-plaintext attacks (attacker knows some of the plaintext as well the resulting ciphertext, like knowing that someone always signs off their message with their name). There are also active attacks, like chosen-plaintext attacks (attacker can choose some part of the plaintext), clandestine attacks (attacker is willing to use bribery/blackmail/etc.), and side-channel attacks (power analysis, RF emissions analysis, timing attacks).

The security model also includes the attacker's computational abilities. For example, information-theoretic security models assume the attacker has infinite computational resources, complexity-theoretic security models assume the attacker has a turing machine capable of computing any polynomial-time algorithm efficiently, and computational-theoretic security models assume the attacker simply has a lot of computers available.

The attacker's goals, in decreasing order of priority: recovering the secret key, recovering plaintext from ciphertext (without the key), or learn some characteristics of the plaintext given ciphertext (besides message length).

An SKES is secure if and only if it is resistant to a chosen-plaintext attack by a computationally bounded adversary (computational-theoretic security). A secure SKES is necessary but not sufficient to guarantee confidentiality.

Additionally, we also assume that the adversaries know everything about the cryptosystem except the plaintext and the key, including the all the algorithms and communicatoin mechanisms.

An SKES should ideally be efficient (for encryption/decryption), have small keys (but not so small that it makes brute force attacks possible), and be secure (even against the designer of the SKES).

A work factor measures how hard a task is, in terms of how many computations are needed to complete it: 2^{56} operations is easy, 2^{64} is feasible, 2^{80} operations is barely feasible, and 2^{128} operations is infeasible. This will change as our computers get faster. For example, the entire bitcoin network is doing 2^{61} hashes per second. However, we use 2^{128} operations as infeasible because the Landauer limit tells us doing that many operations on a classical computer would take a significant fraction of the world's power output for a year.

For example, a substitution cipher is not secure, because it's trivially broken by a chosen-plaintext attack. By choosing the alphabet as the message, the ciphertext is just the secret key. In fact, it's not even secure against ciphertext-only messages. Although it's infeasible to exhaustively try every permutation of the alphabet for valid-looking decrypted messages, we can simply use letter frequency analysis to try likely candidates first.

A polyalphabetic cipher uses multiple alphabet permutations for substitution, choosing between them with a particular algorithm. For example, a Vigenere cipher has a word with non-repeated letters added to the message's letters mod 26. This is nice because as the message grows longer, the letter frequency graph is a bit flatter. We can break that by using a chosen-plaintext attack with plaintext "AAAAAAAAAAAAAAAAAA", so if "A" corresponds to the numerical value 0, the ciphertext is just the key. Therefore, the Vigenere cipher is not secure.

From now on, we assume plaintext, keys, and ciphertext are all binary strings. What happens when we apply the concept of a Vignere cipher, but with a uniformly random key that's as long as the entire message?

A one time pad XORs a random key with the plaintext (the key is as long as the message). However, it's important to not reuse keys, because c_1 = m_1 \oplus k_1 and c_2 = m_2 \oplus k_2 implies that c_1 \oplus c_2 = m_1 \oplus m_2, which leaks a lot of information about the plaintext. If the key is uniformly randomly selected, every key is equally likely, so every ciphertext is also equally likely. This is information-theoretically secure - a one-time pad provably cannot be broken by a ciphertext-only attack even by attackers with infinite computational resources. While one-time pads have these nice properties, in practice they are hard to use because the key has to be as long as the ciphertext, making sharing keys a pain.

A stream cipher is like a one-time pad, but instead of a truly random key, we use a pseudorandom bit generator, where the PRBG's seed is the secret key. This is no longer perfectly secure because it depends on the quality of the randomness of the PRBG, but it's often a practical compromise since the key can be a lot shorter. Like with a one-time pad, we also shouldn't re-use keys, since that would give us some of the values of the PRBG's output by XORing the ciphertexts, which would make it easier to learn the PRBG's state. One example of a stream cipher is RC4.

## 9/1/17

The PRBG should satisfy two requirements:

• Indistinguishability requirement - the output should be indistinguishable from a random sequence.
• Unpredictability requirement - the output should not be predictable even if some of the previous outputs are known.

Most random number generators built into programming languages are not cryptographically secure - they're only intended to satisfy the indistinguishability requirement. For example, in UNIX uses a linear congruential generator with known constants, which is straightforward to predict.

RC4 was designed by Rivest, the R in RSA. It was used in everything from SSL/TLS to Adobe Acrobat, as one of the most popular stream ciphers around. It's nice because it's very fast and has variable key sizes, but was proprietary for a long time has a lot of weaknesses, even though none of them are catastrophic. It consists of a key scheduling algorithm, and a keystream generator.

The key scheduling algorithm generates a random-looking permutation of 0, \ldots, 255. It starts by initializing S with 0, \ldots, 255, and \overline{K} to the key repeated over and over until it fills 256 entries in the array. Then, the array entries in S are swapped based on j_{i + 1} = (S[i] + \overline{K} + j_i) \mod 256. The keystream generator just applies that permutation to generate the keystream, which is then XORed with the plaintext to get the ciphertext.

Wireless networks lose the physical security of wired networks, and security for those networks is a lot harder because attackers can do it from a distance with no physical evidence. The original WiFi standard, IEEE 802.11, includes the Wireless Equivalent Privacy (WEP) protocol for protecting link-level data in transit between clients and access points. WEP was intended to provide confidentiality using RC4, data integrity using a checksum, and access control by rejecting improperly encrypted packets.

In WEP, the client first shares a 40-bit or 104-bit key with the access point (this was because the US classified cryptography as munitions, disallowing most cryptography with keys greater than 40 bits for export; and keys greater than 104 bits for domestic use). Messages are then divided into fixed-size packets, which are then each encrypted with a per-packet initialization vector (IV). The issue is that WEP didn't specify certain things, like how the key should be distributed, and how IVs should be managed.

Implementations ended up using one shared key per LAN, infrequently changed, and just generating random IVs or using consecutive integers as IVs. To send a packet, a party would select a 24-bit IV, compute the CRC-32 checksum of the message, the checksum is then appended to the plaintext, and then XORed with the RC4 keystream to get the ciphertext, where the key is the IV the shared WEP key appended to the IV. The sender then transmits the IV and the ciphertext. The receiver then, having the IV and the WEP shared key, gets gets the plaintext concatenated with the checksum, and then verifies that the checksum is correct, rejecting the packet if the checksum doesn't validate.

Turns out, WEP implements none of its goals. One problem is IV collision - since the IV is only 24-bits, the birthday paradox means that only about 2^{12} packets are needed for a collision. If two packets have the same IV, then c_1 \oplus c_2 = m_1 \oplus m_2 - the ciphertexts can be XORed to get the plaintexts, which can then be analyzed using known plaintexts or statistical analysis, so confidentiality is broken. Another problem is that the checksum is linear - we can make certain changes to the ciphertext while still ensuring that the checksum will verify correctly, so data integrity is broken. Finally, the checksum is unkeyed, so knowing the plaintext for one encrypted packet is enough to get the RC4 keystream and encrypt messages properly themselves.

Shamir and co. (the S in RSA) came up with an even better attack in 2001, based on the fact that 104-bit keys were infrequently changed, the IV is very predictable/randomly selected, and we know the first byte of the plaintext (the protocols add known headers to the plaintext before encrypting). With 5 million packets, the secret key itself can be recovered. Modern techniques can recover the key in just 40000 packets.

IEEE published updated wireless security standards. For example, WPA was a temporary replacement, and WPA2 came out afterwards, using AES instead of RC4, and designed to be much stronger.

## 11/1/17

Assignment 1 should be started now.

Attacks only get better over time, never worse. In WEPs case, it wasn't the security of the algorithms like RC4 that were broken, it was the use of those algorithms that was incorrect. In the future, thanks to upcoming advances related to Moore's law and quantum computing, currently secure cryptosystems could easily become broken.

A block cipher is the other common type of SKES. While a stream cipher encrypts one bit at a time, a block cipher breaks up the plaintext into fixed-length blocks, encrypting the message one block at a time. Some commonly known block ciphers are DES and AES.

DES has a 56-bit key length (which is now relatively easy to break), and a block size of 64 bits. AES (Rijndael, "rined'all") is its successor, and has a 128/192/256 bit key length. There are currently no significant known attacks on AES.

Clause Shannon gave a couple of principles for designing good block ciphers:

• Diffusion - each ciphertext bit should depend as many plaintext bits as possible.
• Confusion - there should be a complicated relatoinship between key and ciphertext bit.
• Key size - keys should be large enough to prevent exhaustive search.

Also, block ciphers should be fast, so we can use them in more applications.

A Feistel cipher is a class of ciphers, including DES. Feistel ciphers are parameterized based on n (half of block size), h (number of rounds), and l (key size). It generates h subkeys from the cipher key, one for each round. Each key is used to generate a component function f_1, \ldots, f_h that "scrambles" its input.

Each block m is then broken into two halves, \tup{m_0, m_1}. Then, we perform h encryption rounds: at each round i, the right half gets moved into the left half, and the original left half is XORed with f_i applied to the original right half, and that becomes the right half. After those h rounds, the two halves are the ciphertext.

## 13/1/17

;wip: feistel ciphers

The New Data Seal cipher was invented by IBM as the predecessor to DES. It's a Feistel cipher, and is a relatively complicated one, but it also happens to be completely broken today to chosen-plaintext attacks. It has a block size of 64 bits, and uses 16 rounds, so n = 64, h = 16. Here's the basic idea:

1. Let S_k: \set{0, 1}^8 \to \set{0, 1}^8 be the secret key - an arbitrary function of a byte.
2. Let f: \set{0, 1}^{64} \to \set{0, 1}^{64}, the component function, be defined as follows:
1. Since the input is 64-bits, let z = \tup{z^1, \ldots, z^8} be the 8 bytes of the input.
2. For each byte z^j, let n_1^j = z^j[7:4] and n_2^j = z^j[3:0] - the two nibbles of the byte.
3. Let t = S_k(z^1[7] \ldots z^8[7]) - the key function applied to the byte obtained by taking the first bit of each byte in z.
4. For each byte z_j, if bit j of t is 1, let p_1^j = S_1(n_2^j) and p_2^j = S_0(n_1^j), otherwise let p_1^j = S_0(n_1^j) and p_2^j = S_1(n_2^j). S_0, S_1 are functions defined by the NDS specification (that are too long to list here), and we're swapping the nibbles if the corresponding bit of the key function S_k is true.
5. Output P(p_1^1 p_2^1 \ldots p_1^8 p_2^8) - a scrambling function P applied to all of the scrambled and permuted nibbles concatenated together. P is also a function defined by the NDS specification (that is too long to list here).
3. For each 16-byte block of the input z = \tup{z^1, \ldots, z^8} (each block contains 2n bits), run the Feistel cipher ladder with f as the component function:
1. Let m_0, m_1 be the two 8-byte halves of the 16-byte block.
2. For 16 rounds 1 \le j \le 16, set m_j to m_{j - 1} and m_{j + 1} to m_{j - 1} \oplus f(m_j) (simultaneously).
3. Output m_{16} m_{17}.

Since S_k can be considered a random function, there are 2^8 possible values of S_k(x) for each of the 2^8 possible values of x - we might think of serializing the function as a 256-byte sequence. That means there are 256^{256} possible values of S_k, or 2^{2048}, which makes trying every possible S_k computationally infeasible.

In fact, we can recover the entire secret key from the ciphertext in a few hundred chosen plaintext attacks. The main issue is that there's no subkeys - each round used the same component function and secret key, rather than deriving unique component functions for each round - each f_1, \ldots, f_h is the same function! Let T denote the function that does one round of encryption, with regard to the secret key S_k. Basically, T(\tup{m_{i - 1}, m_i}) = \tup{m_i, m_{i - 1} \oplus f(m_i)} for some fixed function f and the two halves of the message are m_0, m_1.

Clearly, applying T(m) to the message 16 times is the whole encryption function. Let's represent this as T^{16}(m). Let F = T^{16} - this is the full encryption function, since the cipher is just applying T 16 times. Clearly, T(F(m)) = T(T^{16}(m)) = T^{17}(m) = T^{16}(T(m)) = F(T(m)) for any byte m.

Here's the attack. For every possible byte r, we want to determine S_k(r). Select u = \tup{m_0, m_1} such that the byte formed by taking the first bit of each byte in m_1 is r, and p_1^j \ne p_2^j for 1 \le j \le 8 - the scrambled nibbles aren't the same within each of the 8 bytes in m_1.

As an aside, x^* seems to mean "the first bit of each byte in x".

By the rules of the chosen plaintext attack, we can get Bob to give us \tup{a, b} = F(u), but not what F(m) or T(m) are (we know the values of the function at this point, but that's it).

However, we know that T(F(u)) = F(T(u)) = \tup{b, \cdot}. Since the value of S_k(r) is only a byte, we can guess every possible value of S_k(r), and check each guess t by computing T_t(u) and then get Bob to give us F(T_t(u)) = \tup{c, d}. Clearly, b \ne c implies that S_k(r) \ne t, and b = c implies that S_k(r) = t is very very likely (there is a tiny chance of accidentally getting it right). This is because we're assuming that F works roughly like a random permutation, so since b and c are 64 bits, F(T_t(u)) has only a \frac 1 {2^{64}} probability of b = c without S_k(r) = t, because we selected u such that p_1^j \ne p_2^j. Eventually, we get every possible value of S_k.

Also, since we check every possible byte r (256 values), and for each value of r, we guess-and-check possible bytes t (on average, 128 checks, worst case 256), we should expect 128 \times 256 = 2^{15} chosen plaintexts on average before recovering the entirety of S_k.

The entire attack can be summarized as:

1. For each possible byte r:
1. Find a byte u = \tup{m_0, m_1} such that m_1^* = r and p_1^j \ne p_2^j, 1 \le j \le 8.
2. Get Bob to compute F(u) = \tup{a, b}.
3. For each possible byte t:
1. Compute T_t(u).
2. Get Bob to compute F(T_t(u)) = \tup{c, d}.
3. If b = c, we know that S_k(r) = t with overwhelming probability, and then go to the next r.
2. Now we have S_k(r), the secret key. We can use this to decrypt the ciphertext.

Basically, we need about 2^{15} chosen plaintexts to recover the entire secret key, which is quite feasible on today's computers.

## 16/1/17

Chosen plaintext attacks are not always feasible, but when they are they're very powerful. For example, consider a mail forwarder that encrypts incoming emails and forwards them to another destination.

DES is one of NDS' successors. Originally with 64-bit keys, the NSA weakened the final standard to 56 bits. DES is pretty weak compared to modern ciphers, but 3-DES is still commonly in use. The design principles are classified, but it's been heavily analyzed.

DES is a Feistel cipher with 16 rounds, a 64-bit block and a 56-bit key. For each round of the cipher, a 48-bit subkey is derived from the 56-bit secret key, by selecting 48 bits from the secret key. The plaintext halves are also expanded from 32-bits to 48-bits. After each Feistel round completes, we use 8 S-box functions (which are publicly known) to scramble the XORed result and reduce 48-bit results to 32-bits. This is the source of nonlinearity in DES, and is very important for security.

With a key space of size 2^{56}, it's perfectly practical to just brute force today, even if we can't do it on our laptops just yet. A massively parallel effort broke a DES-encrypted message in just over 22 hours in 1999, in response to a challenge set by RSA security. Also, it's got a block size of 64-bits, which means that you'd expect a collision at around 2^{32} uniformly distributed blocks, thanks to the birthday paradox. So, on a busy network, if Alice and Bob are sending many blocks back and forth, Eve might observe a duplicated block, which tells us a little bit of information about the plaintext - those two corresponding blocks in the plaintext are identical (this isn't much of a problem in practice, but is still an issue).

Differential cryptoanalysis attacks can be used to recover the key in 2^{47} chosen plaintexts, though this is usually an infeasibly large number of chosen plaintexts in practice (turns out, DES was specifically designed to guard against this - the NSA was aware of the attack before it was public knowledge). Linear cryptoanalysis attacks can do the same thing in just 2^{43} chosen plaintexts, which is slightly more practical.

After these weakenings, 3-DES was invented as a more secure SKES in order to take advantage of existing DES hardware (DES was designed for hardware acceleration). Basically, it's just DES applied to DES applies to DES applied to the plaintext. Note that applying ciphers repeatedly doesn't always provide more security (consider a substitution cipher, for example), but in this case it provides reasonably good. Another variant is Double-DES, which is DES applied to DES applied to the plaintext (each DES function has its own key, so we have a 112-bit key overall).

Turns out Double-DES is vulnerable to a meet-in-the-middle attack. Basically, if E_{k_1}, E_{k_2} are the two DES encryption functions in Double-DES and c = E_{k_2}(E_{k_1}(m)), then E^{-1}(c) = E_{k_1}(m).

Suppose we have 3 known plaintext/ciphertext pairs \tup{m_1, c_1}, \ldots, \tup{m_3, c_3}. For each possible 56-bit k_2, decrypting c_1 with our guess for k_2, and store that plaintext and the key used in a table, indexed by the plaintext. Then, for each possible 56-bit k_1, try encrypting the m_1 with our guess for k_1, and see if it matches any entry in the table. If there's a match, and those guesses for k_1, k_2 also satisfy c_2 = E_{k_2}(E_{k_1}(m_2)) and c_3 = E_{k_2}(E_{k_1}(m_3)), then our guesses for k_1, k_2 are the actual secret keys.

Due to this attack, Double-DES is only slightly better than plain DES, with about 2^{57} bits of security and requiring about 2^{56}(64 + 56) bits of memory to store the table. The main hard part is getting the ~1 exabyte of storage needed to mount this attack, but it's quite within the realm of possibility. We can even do a time-memory tradeoff and do 2^{56 + s} DES operations in return for 2^{56 - s} bits of memory, which can easily bring the memory requirements down to a manageable size while keeping the number of operations feasible.

## 18/1/17

Why do we need 3 pairs of known plaintext/ciphertext pairs rather than just 1 or 2? Well, for each key in the key space, we can think of the encryption function is just a random function (it's not, but this is a good enough assumption for our purposes).

Let k be the actual l-bit secret key (so E_k(m) = c) and k' (also l bits) be our guess for this key. If E_{k'} is a uniformly random function (from our assumption), then it clearly has a \frac 1 {2^L} probability of satisfying E_{k'}(m) = c, where L is the number of bits in the plaintext - it might just randomly happen to output the right ciphertext given our particular plaintext. What's the probability that our guess is correct if it satisfies E_{k'}(m) = c?

Suppose E_{k'}(m) = c and k' \ne k (our guess for the key is incorrect, but gives the right ciphertext for our particular plaintext). Then the number of E_{k'} such that E_{k'}(m) = c is \frac{2^l - 1}{2^{L}} (2^l - 1 is the number of keys in the keyspace that are not the actual key, and 2^{L} is the number of possible plaintexts). This is essentially the number of incorrect keys that work for a single plaintext/ciphertext pair without actually being the correct key.

We want to use enough plaintext/ciphertext pairs such that the expected number of these incorrect keys is close to 0. Clearly, with t of these plaintext/ciphertext pairs, we have \frac{2^l - 1}{2^{Lt}} expected false keys. For Double-DES, one pair gives us \frac{2^{112} - 1}{2^{64}} expected incorrect keys, which is far too many. With 3 pairs, we get around \frac 1 {2^{16}} expected false keys, much more reasonable.

There's also triple DES, which is just applying DES three times now - c = E_{k_1}(E_{k_2}(E_{k_3}(m))), where c is the resulting ciphertext, m is the plaintext, and k_1, k_2, k_3 are the three 56-bit DES keys. We don't have any proof that it's more secure than DES alone, but a meet-in-the-middle attack takes around 2^{112} steps, around 2^{64} message/plaintext pairs, and a huge table.

Given our plaintext m = m_1 \ldots m_t, how do we encrypt it if the Block cipher modes of operation:

• Electronic Codebook (ECB) - input is split into blocks, and then each block is encrypted with the key using the block cipher. In other words, for each plaintext block i, c_i = E_k(m_i).
• This is very bad, never use it - identical plaintext blocks result in identical ciphertext blocks under the same key, so chosen-plaintext attacks can tell you whether a given plaintext block is equal to an attacker-chosen plaintext block.
• Cipher Block Chaining (CBC) - select a random L-bit initialization vector (where L is the block size) as c_0. Then, for each plaintext block i, let c_i = E_k(m_i \oplus c_{i - 1}).
• This is actually proven to be semantically secure against chosen plaintext attacks, assuming that the block cipher itself is semantically secure.

AES was developed as part of a public, open competition in 1997 to build a successor to DES. It needed to have 3 key sizes (128, 192, and 256 bits), have a 128-bit block size, and be efficiently implementable on hardware/software. While 128 bits is infeasible for the foreseeable future, the larger key sizes are intended to protect against potential future attacks, as well as quantum computers - Grover's algorithm can perform exhaustive key search in only 2^{\frac 1 2 l} operations, where l is the number of bits in the key.

With 15 submissions initially, Rijndael won in the end. Rijndael became AES in late 2001.

Rijndael is an iterated block cipher, not a Feistel cipher. Specifically, a substitution-permutation network. Currently, there's no known attack better than exhaustive key search on AES.

We won't cover the technical details of AES in this course, but some more details are available in the slides

## 20/1/17

x \in_R S means that x is randomly and independently chosen from S.

A hash function is a fixed function that transforms an input of arbitrary length and output a fixed-length string, called a digest. Some common hash functions are MD5, SHA1, and SHA512.

Formally, an n-bit hash function is H: \set{0, 1}^* \to \set{0, 1}^n. A good hash function should satisfy three examples:

• Pre-image resistance - given a randomly selected hash value y \in \set{0, 1}^n, it is computationally infeasible to find x such that H(x) = y.
• In other words, it's hard to invert the hash function.
• Consider a service that stores accounts as user IDs and hashes of passwords. If the account data was leaked and the hash function didn't have pre-image resistance, attackers could obtain the passwords from the password hashes.
• First pre-image resistance generally implies second pre-image resistance in practice, but not in theory. If we can do a first pre-image attack, then we could mount a second pre-image attack by hashing the given document and repeatedly running the first-preimage attack on that hash until we get a document that isn't equal to the original.
• Second pre-image resistance - given a random value x \in \set{0, 1}^*, it is computationally infeasible to find a x' \ne x such that H(x') = H(x).
• In other words, it's hard to find a collision for a given input in the hash function with non-negligible probability.
• Consider a software publisher that has users verify their updates using a certain hash. If the hash function doesn't have second pre-image resistance, attackers can construct a different update that still passes verification.
• Collision resistance - it is computationally infeasible to find x, x' \in \set{0, 1}^* such that x \ne x' and H(x) = H(x') (a collision is such a pair \tup{x, x'}).
• In other words, it's hard to find any collisions in the hash function.
• Note that hash functions must necessarily 0, have collisions, by pidgeonhole principle and the fact that the output space is finite while the input space is infinite. However, it should be computationally feasible to find even a single one.
• Consider the common practice of digital signature schemes signing the hash of the message rather than the message itself (due to overhead of signing long messages using things like RSA). If a hash wasn't collision resistant, the signer could find a collision x, x'$, sign x, and later on claim to have signed x' instead. That means Alice might • Collision resistance implies second pre-image resistance (this is easily proved via the contrapositive). • Second pre-image resistance doesn't imply collision resistance, however (this is easily proved by constructing a hash function with only 1 collision that is computationally feasible to find: the probability that we actually get that colliding input is negligible for a second pre-image attack, but the collision is definitely there, so we have a second pre-image resistant hash function that isn't collision-resistant). • Collision resistance also doesn't imply pre-image resistance. Let H be an n - 1-bit pre-image and collision resistant hash function. Let \overline H = \begin{cases} 1 \| x &\text{if } x \text{ has } n - 1 \text{ bits} \\ 0 \| H(x) &\text{otherwise} \end{cases}. Clearly, H' is still collision-resistant, but any random hash value that begins with a 1 (so 50% of all hash values) has a trivially computable pre-image (just take off the 1 at the beginning). Therefore, H is a function that is collision resistant yet not pre-image resistant. • That said, if the hash function is somewhat uniform (so hash values all tend to have roughly equal numbers of pre-images), collision resistance does guarantee preimage resistance. We'll prove this by contradiction. Suppose we have a hash function H that is collision resistant and somewhat uniform, but not pre-image resistant. Let x be an arbitrary input, and find a pre-image of H(x). Since H(x) has roughly the same number of pre-images as for any other hash value, it should have a lot of pre-images. Plus, since it's not pre-image resistant, we can find another x' such that H(x') = H(x). However, x' and x are now a collision, which we found efficiently, so H isn't collision resistant - contradiction! Therefore H must be pre-image resistant as well. • Therefore, collision resistance is the hardest property to achieve in practice, which makes sense, since the hash function must have infinite collisions. A Davies-Meyer hash function is a family of hash functions that uses block ciphers. Basically, given a fixed IV H_0 and an n-bit block encryption function E_k, we break up the plaintext into n-bit blocks x_1, \ldots, x_t (after appending 1 to the end and padding with 0 to the nearest block size), then H_i = E_{x_i}(H_{i - 1}) \oplus H_{i - 1} and H_t is the hash value. A one-way function is a pre-image resistant hash function. A cryptographic hash function is a pre-image resistant and collision-resistant hash function. A generic attack is an attack on a hash function that treats it as a black-box/ideal hash function. As a result, a generic attack must treat the hash function as a random function H: \set{0, 1}^* \to \set{0, 1}^n (an ideal hash function is a random function). In practice, actual random functions are too complex to represent in a reasonable way. A generic attack for finding pre-images: select arbitrary x \in \set{0, 1}^* until we find H(x) = y. Clearly, we expect about 2^n time. A generic attack for finding collisions: select arbitrary x \in \set{0, 1}^* and store them in a table indexed by H(x), until a hash is found. By birthday paradox, we expect that to take about \sqrt{2^n} time and memory. ## 23/1/17 Checking for collisions generically, as mentioned last class, would take around \sqrt{2^n} time and memory. However, there's actually a generic attack for finding collisions that has the same order of running time (\sqrt{\frac{\pi 2^n}{2}}), but requires far less memory. Basically, given H: \set{0, 1}^* \to \set{0, 1}^n, we want to find x_1, x_2 \in \set{0, 1}^*, where x_1 \ne x_2 yet H(x_1) = H(x_2). How does this algorithm, known as the Van Oorschot/Weiner (VW) algorithm, work? First, we assume that H: \set{0, 1}^n \to \set{0, 1}^n is a random function. Now, we'll search for x_1, x_2 in \set{0, 1}^n rather than \set{0, 1}^*, by defining a sequence of hash values w_0 \in_R \set{0, 1}^n, w_i = H(w_{i - 1}) - a sequence consisting of a hash value, the hash of that hash value, the hash of the hash of that hash value, and so on. This is a random sequence since we're assuming H is a random function. Clearly, there must be a repeat (collision) in the sequence somewhere, because an element of the sequence can only have one of 2^n possible values. That means the sequence is finite and ends in a cycle. Let j be the smallest index such that x_j = x_i for some i < j - j must exist because there must exist a collision. Since , x_{j + l} = x_{i + l} for any l \ge 0. Let N = 2^n. From the birthday paradox, we expect the value of j to be E[j] = \sqrt{\frac{\pi N}{2}}, or around \sqrt{2^n}. Also, E[i] = E[j - i] = \frac 1 2 \sqrt{\frac{\pi N}{2}}. If we assume, without loss of generality, that i \ne 0, then H(x_{i - 1}) = H(x_{j - 1}) and x_{i - 1} \ne x_{j - 1} (with very high probability) - a collision! How do we actually find x_{i - 1}, x_{j - 1}, without using much storage? If we think of the sequence of hashes is like a linked list, where H(x_i) gets the next element x_{i + 1}, then this simply becomes a problem analogous to finding cycles in a linked list, which we could solve using something like the tortoise-and-hare algorithm with just constant memory. However, that means a lot of wasted time unnecessarily computing hashes, and it only tells us that there is a collision, not what the collision actually is. A better way we could do this is to define some easily distinguishable property of the hash values that applies to around fraction \theta of all hash values, for a small value of \theta. For example, the property might be "the first 20 bits are 0" (\frac 1 {2^{20}} of hash values satisfy this property). Then, we only store the hash values in the sequence that satisfy this property (these hash values are called "distinguished points"). When we encounter a distinguished point for the second time, we've found a collision somewhere behind us in the sequence - assuming there's a distinguished point between i and j, we must eventually encounter it again when our sequence loops around to x_{j}. We can assume there's a distinguished point between i and j because our \theta is chosen to be large enough that there is a very high probability that there is a distinguished point in the cycle - for example, for a 32-bit hash we would expect the cycle to contain about \frac 1 2 2^{32} = 2^{31} entries, so we might make our distinguishing property "the first 21 bits are 0", giving an expected value of 2^{10} distinguished points in the cycle. That does mean there's a very tiny chance the algorithm doesn't terminate, however - we can get around this by limiting the length of the path to 2^n and retrying. There are a lot of other edge cases too (what if the starting value has a collision in the sequence?), but they all happen with negligible probability and can be handled with relatively few operations. Algorithm: 1. Randomly select a x_0 from \set{0, 1}^n, store \tup{x_0, 0, \text{empty}} in the table. 2. Let L = x_0 (L is the most recently stored distinguished point). 3. For j \ge 1 do (phase one starts - finding duplicate points): 1. Let x_j = H(x_{j - 1}). 2. If x_j is distinguished: 1. Store \tup{x_j, j, L} into the table and let L = x_j. 2. If x_j was already in the table, we know x_i = x_j, and stop this loop to begin phase 2 - finding the actual collision values. 4. Let a be the index of the distinguished point right before i, and b be the index of the distinguished point right before j. This can be looked up as the third element of tuples in the table - the values of L associated with x_i and x_j respectively. 5. Let l_1 = i - a, l_2 = j - b. Assume without loss of generality that l_1 \ge l_2. Let R = l_1 - l_2 - how much longer the earlier chain is until the collision than the later chain. 6. Compute x_{a + 1}, \ldots, x_{a + R}. Let p_1 = x_{a + R} and q_1 = x_b. Now the collision is an equal number of hashes away from p_1 and q_2. 7. Let p_{k + 1} = H(p_k) and q_{k + 1} = H(q_k). Compute p_k and q_k until we find a value k such that p_k = q_k. Then, the previous value p_{k - 1} and q_{k - 1} form a collision. So overall, step 3 takes around \sqrt{\frac{\pi N}{2}} + \frac 1 \theta time, while the steps after it take \frac 3 \theta. The memory usage is around \theta \sqrt{\frac{\pi N}{2}} \times 3n bits. ## 25/1/17 If we have a 128-bit hash, we might choose a \theta = \frac 1 {2^{32}}, for about 2^{96} distinguished points. With this, we would expect around 2^{64} operations and 2^{41} bits, or around 256 gibibytes - totally feasible on modern hardware. This tells us that any 128-bit hashes are simply not collision-resistant today. How do we parallelize the VW algorithm, given m processors? One naive way is to run the algorithm with different starting values on each processor (each with its own memory), and see which one finishes first. Turns out this uses time \frac{\sqrt{\frac{\pi N}{2}}}{\sqrt{m}} + \frac 4 \theta - not very good scaling. Instead, we can store all of the distinguished points from the m processors in a central table - at each time unit, each processor adds a new unique hash value, until one of the distinguished points collide. This takes \frac{\sqrt{\pi N}{2}}{m} + \frac 4 \theta operations, which is about as well as you can expect this algorithm to be parallelized. How do we find meaningful collisions in a hash function? Suppose m_1 is "Alice owes Bob 1000 dollars" and m_2 is "Alice owes Bob 10 dollars". Alice wants to find values m_1', m_2' that semantically mean the same thing as m_1, m_2 (e.g., "Alice should give Bob$1000" and "Alice ought to pay Bob $10"), sign m_1, make Bob agree to it, and then later claim to have signed m_2. After Bob agrees to it, Alice can claim to have signed m_2 instead, only owing 10 dollars rather than 1000 dollars. How can Alice find m_1', m_2' using the VW algorithm? Partition m_1 into n parts, with partitions at indices j_1, \ldots, j_n. Let g_{m_1}(r): \set{0, 1}^n \to \set{0, 1}^* be m_1 modified so that a space is inserted into m_1 at position j_i if and only if the ith bit of r is 1. Note that g_{m_1}(r) should have the same meaning as m_1, since we just added a space somewhere. Likewise, define g_{m_2}(r) for inserting spaces into m_2. Let S_0 be the elements of \set{0, 1}^* that begin with 0, and S_1 be the elements that begin with 1. Let f: \set{0, 1}^n \to \set{0, 1}^n as f(r) = \begin{cases} H(g_{m_1}(r)) \text{ if } r \in S_0 \\ H(g_{m_2}(r)) \text{ if } r \in S_1 \end{cases} - the hash of a message semantically equivalent to m_1 if r begins with 0, and the hash of a message semantically equivalent to m_2 if r begins with 1. If we run the VW algorithm on f to find a colliding r, we get a collision a, b \in \set{0, 1}^n with a \ne b and f(a) = f(b). With 50% probability, the first bit of a is different from the first bit of b, assuming they're drawn from a uniform distribution (we can assume this because we assume the hash function is a random function). Suppose that the first bits are different - then a = g_{m_1}(r) and b = g_{m_2}(r), and a, b are two messages, semantically the same thing as m_1, m_2 respectively, that collide. If the first bits are not different, then we can just run the VW algorithm repeatedly with a different starting point until we get a collision that does have messages with two different first bits. Clearly, the running time is essentially the same for each iteration, and with a 50% probability of success, we can just run it a few times with different variants of g_{m_1}, g_{m_2} until we get a meaningful collision. ## 27/1/17 So now we've looked at the naive hash collision finding, the VW algorithm, the parallel VW algorithm, and the meaningful collisions with the parallel VW algorithm. Now let's look at general methods for constructing hash functions. Merkle's meta method is one of these. Given an input of n + r bits, we define a compression function f: \set{0, 1}^{n + r} \to \set{0, 1}^n. We also have an IV value in \set{0, 1}^n. Using these, we can use Merkle's meta method to define a hash function H: \set{0, 1}^* \to \set{0, 1}^n. Let x \in \set{0, 1}^*. Let x_1 \ldots x_t be x broken into r-bit blocks (the last one should be zero-padded if needed). Let x_{t + 1} be the binary representation of the length of x in bits (technically, this limits the length of x to 2^r, but in practice 2^r is so large that we can simply pretend x is unbounded in length). Let H_0 be the IV. For 1 \le i \le t + 1, we do H_i = f(H_{i - 1}, x_i). Then, H(x) = H_{t + 1}. Merkle's theorem says that if f, the compression function, is collision-resistant, then so is H. This means we only have to make a fixed input size function collision-resistant, rather than an unbounded-input-size function collision-resistant. That means we should usually try to attack the compression function f rather than the hash function itself. This theorem can be proven using the contrapositive. Assume H is not collision resistant, so we can efficiently find a collision \tup{x, x'} \in \set{0, 1}^*. Let \overline x = x_1 \ldots x_t and b be the length of x in bits and H_1, \ldots, H_{t + 1} be the iteration values for x, and let \overline x' = x_1' \ldots x_{t'}' and b' be the length of x' in bits and H_1', \ldots, H_{t' + 1}' be the iteration values for x'. Clearly, if b \ne b', then x_{t + 1} \ne x_{t' + 1}', so f(H_t, x_{t + 1}) = f(H_t', x_{t' + 1}') and x_{t + 1} \ne x_{t' + 1}' - a collision! Now suppose b = b', so x_{t + 1} = x_{t + 1}' and t = t'. If we work backwords from the last iteration toward the first iteration, we must eventually find some H_i = f(H_{i - 1}, x_i) and H_i' = f(H_{i - 1}', x_i') such that \tup{H_{i - 1}, x_i} \ne \tup{H_{i - 1}', x_i'}, since x \ne x' - they must differ in at least one message block. This is a collision for f. So, a collision is always computationally efficient to find for f. So if f is collision resistant, H must be as well. ## 30/1/17 MDx is a family of iterated hash functions, the most well-known of which are MD4 and MD5. MD4 was a 128-bit hash invented by Ron Rivest in 1990, but it had some serious flaws - in 2004 someone found collisions by hand. Pre-image is still infeasible, however. MD5 is another 128-bit hash, a strengthened version of MD4. Since its invention in 1991, we've found MD5 collisions in 31 seconds in 2006 on commodity hardware, while preimages can be found in 2^{123.4} steps. However, MD5 is still in very wide use today. SHA-1 is a 160-bit hash function invented by the NSA in 1993, and modified by the NSA again in 1994 to fix a classified weakness. In 2005, it was found that that fix increased the cost of finding collisions from 2^{39} to 2^{63}. Collisions are feasible to find today, but nobody seems to have done it yet - we still want to move away from SHA-1 though, since it's theoretically feasible to find collisions. We don't know of any pre-image/second pre-image attacks on SHA-1 that are faster than the generic attacks. In about two weeks from now, SSL will no longer consider SHA-1 hashes for server certificates valid. SHA-1 is a Merkle-style hash with n = 160, r = 512. Most of the design decisions are classified, because NSA, and the hash operations themselves are rather messy. In 2001 the NSA proposed SHA-2, a family of hash functions with three variants, each with different output sizes - SHA-256, SHA-384, and SHA-512. Note that SHA-2 has the same security against collisions as AES-128/AES-192/AES-256. As of this writing, no attacks stronger than the generic attacks have been found, but there are concerns remaining because the design is pretty similar to SHA-1. SHA-3 is a NIST hash function competition, much like the competition that resulted in AES. Starting with 64 candidates, Keecak was selected as the winner in 2012. Keecak isn't an iterated hash function, using a sponge construction method, and it isn't very widely used in practice due to most people being fine with the SHA-2 family. NIST recommends we stop using SHA-1 if possible, though it's okay for HMACs/KDFs/RNGs. SHA-2 and SHA-3 are recommended. Suppose we can find a meaningless collision in an iterated hash function \tup{x, y}. How can we exploit this in practice? Collision attacks are useful when we can modify both the original document, and the colliding document. Wang proposed an attack on SHA-1 that gives us a two-block collision (two two-block-long messages that hash to the same value) that takes only around 2^{63} operations. Wang also proposed an attack on MD5 that gives us a one-block collision (two one-block-long messages that hash to the same value) in only 2^{39} operations. We've actually applied these and found real collisions in SHA-1 and MD5. How do we exploit this concretely? Suppose \tup{c_1, c_2} is a one-block MD5 collision (without loss of generality, assume the blocks don't contain or , so it can fit into a Postscript literal). Now consider the following two Postscript files: The two files are identical except for line 3, where we have in the first and in the second. Clearly, the first one prints out the harmless message, while the second one prints out the harmful message. However, note that the second line has been padded with spaces so that the second block starts exactly at either in the first message, or in the second message (MD5 uses a block size of 64 bytes, and the file is assumed to be using Windows-style CR-LF line endings). Suppose x and y have the same number of blocks in an iterated hash function. Let F(I, x) = H_t where I = H_0 and H_i = f(H_{i - 1}, x_i). Clearly, if F(I, x) = F(I, y), then F(I, xz) = F(I, yz) for any string z. In other words, given a collision \tup{x, y}, \tup{xz, yz} is also a collision for any string z. This is called a length extension attack. Also, if F(I, x) = F(I, y), then F(I, zx) = F(I, zy) for any string z that is an integer multiple of the block size long. In other words, given a collision \tup{x, y}, \tup{zx, zy} is also a collision for any blocks z. Note that each Postscript document can be made by prepending a block to one of /, and then appending the rest of the document. Since / collide, the Postscript documents themselves must also collide! Suppose Alice sends Bob the first Postscript file, and asks Bob for a signature. Bob opens it in a Postscript viewer, sees the harmless message, then signs the MD5 hash of the document (because signing the whole document with public key crypto would be too unwieldy), and sends the signature to Alice. However, Alice now has a signature from Bob for the second Postscript document, which has the same MD5 hash but a different, harmful message - Alice has convinced Bob to sign something Bob didn't want to! (Menezes now demonstrates an MD5 collision on a real document, where the MD5 hash of a Postscript file that results in a PDF saying "John didn't leave his hat in my office" collides with a Postscript file that results in a PDF saying "John should be given a mark of 100 in this course".) ## 1/2/17 ;wip: slides up to slide 192 ish, MAC schemes (a lot like the cipher block chaining scheme used for symmetric ciphers), ;wip: MACs can't provide non-repudiation, because at least two parties must have the shared key. ;wip: encrypt-and-mac is insecure because the MAC can leak info about the plaintext. suppose we have a secure MAC H_k(m). construct a new MAC H_k'(m) = H_k(m) \| m. Clearly, this is still a secure MAC, but it leaks the entire plaintext. ## 3/2/17 Instead, we can append the key to the message before we hash it, rather than prepending it. This ensures that the attacker can't use a length extension attack anymore because they don't know what the secret key K is, but it requires the hash function to be collision resistant in order for the resulting MAC to secure. This is called the secret suffix method. If the hash function isn't collision resistant, then we can efficiently find a collision \tup{x_1, x_2}. Suppose that x_1 and x_2 have lengths that are multiples of the hash block length. Then H(x_1 \| K) = H(x_2 \| K), because the values being passed into the hash function start with two different but colliding values. Therefore, we've found a collision for the MAC - a collision attack! To avoid the requirement that the hash function be collision resistant, we can use the envelope method, in which we prepend and append the key to the message before we hash it. This successfully avoids the length extension attack because the attacker doesn't know what K to append, and the collision attack because the attacker probably can't find a collision where both values start with K, without even knowing what the value of K is. The Hash MAC (HMAC) is a commonly used MAC scheme that ;wip. H_k(x) = H(K \oplus \mathrm{opad} \| H(K \oplus \mathrm{ipad} \| x)), where \mathrm{ipad} = 0x36, \mathrm{opad} = 0x5C repeated for each byte of K. The inner hash is just the secret prefix method, but hashing for the second time ensures that the hash can't ;wip. This can be proven to be secure if the compression function used in the iterated hash function H is itself a secure MAC scheme with fixed length messages and a secret IV. HMAC is used a lot in industry, as part of IPSec, SSL, and TLS. In practice, we generally use SHA-1 for the HMAC hash function, and it seems to be unaffected by the known weaknesses in SHA-1. A secure symmetric encryption scheme ensures confidentiality. A secure MAC scheme like HMAC ensures authentication (data integrity and data origin authentication). If Alice needs to send Bob a message with both confidentiality and authentication, one way to do this is Alice can send Bob \tup{c, t} = E_{k_1}(m), H_{k_2}(m), where k_1, k_2 are secret keys pre-shared with Bob. This is called Encrypt-and-MAC. However, the issue here is that the MAC scheme isn't designed for confidentiality, only authentication, so theoretically, the HMAC might leak information about the plaintext. We still use this a lot in practice though, because most MAC schemes don't seem to leak any information about the plaintext. A better way to do it would be to send \tup{c, t} = \tup{E_{k_1}(m), H_{k_2}(E_{k_1}(m))} (ciphertext, tag) instead - we tag the encrypted message using the HMAC, rather than tagging the plaintext. This can be proven to be secure if the encryption and MAC scheme are secure. This is called the Encrypt-then-MAC. Authenticated encryption schemes try to provide both confidentiality and encryption in a more integrated package. The most common one in AES-GCM, which is faster than generic encrypt-then-MAC, as well as authentication-but-not-encrypted headers for data. It uses AES in counter mode (CTR mode - encrypting a counter using the key with AES, and then using the output of that as the CSPRNG for a stream cipher) and a custom MAC scheme, and requires the IV never be reused. One advantage of AES-GCM over something like the encrypt-then-MAC is that it's parallelizable and faster. AES-GCM outputs the IV, authenticated header, authenticated and encrypted message, and the 128-bit authentication tag. ## 6/2/17 AES-GCM is widely used in the real world for combined authentication and encryption, because it's fast and is proved correct by Iwata-Ohashi-Minematsu in 2012, under certain assumptions. In fact, Intel and AMD processors have special instructions to make AES-GCM implementations a lot faster. ## Public Key Cryptography The main drawback of symmetric cryptography is the key establishment problem - how do we distribute secret keys to all the parties that need it, but nobody else? One way to solve this is to distribute keys over a secure channel point-to-point, like a face to face meeting, a trusted courier, or a key embedded in a smart card. However, it's hard to obtain a secure channel, and if we already had a secure channel, we could often just use that channel to communicate instead. Another way is to give the key to a trusted third party, which distributes the secret key as desired. However, finding a trusted third party is quite difficult, and it's also a single point of failure. There's also the key management problem - in a network of n users, each user needs one key for every other user, so there are O(n^2) keys needed. Also, symmetric encryption can't achieve non-repudiation, because there are always multiple parties that could have written a given message (since the key needs to be shared with other parties in order to communicate). Theoretically we could do this by sharing a key only with a trusted third party that ensures non-repudiation, but that kind of defeats the point of using cryptography for non-repudiation in the first place. Public key cryptography works somewhat differently. Using a public key cryptosystem, Alice and Bob first exchange authenticated (rather than secret) information, and then can communicate securely. It was first invented by Merkle, Diffie, and Hellman. It's much easier to exchange information in an authenticated way, then in a secret way. Intuitively this seems impossible, and Merkle's professor originally told him it was impossible back in 1975, but it is indeed possible. The advantages of public key cryptography over symmetric cryptography are: only requiring an authenticated channel rather than a secured one, each user only has one keypair, a signed message can be verified by anyone, and signatures can achieve non-repudiation. The disadvantages are: private/public keys are usually a lot larger than symmetric cipher keys, making them harder to share in some cases, and public key cryptography is usually a lot slower. Suppose Alice and Bob want to communicate securely. Alice and Bob want to establish a session key by communicating over a public, authenticated channel. We'll now describe Merkle's proof-of-concept public key cryptography idea. Alice creates puzzles P_1, \ldots, P_N for some large N, and each puzzle takes time t to solve. The solution to a given puzzle P_i is the session key k_i and the serial number n_i, and Alice keeps track of the solutions to each generated puzzle. Alice sends all the puzzles to Bob, over the authenticated channel. Then, Bob randomly selects a puzzle P_j, solves it in time t to obtain k_j and n_j, then sends n_j back to Alice. Since Alice knows n_j, Alice can easily look up the corresponding k_j. Now, both Alice and Bob know k_j, which they can now use as the session key. In contrast, Eve doesn't know which puzzle has serial number n_j, so Eve must try each puzzle to see if the serial number matches. That means on average Eve must solve \frac N 2 puzzles, spending \frac{Nt}{2} time overall. For large enough values of N like N = 10^9, it should become infeasible for Eve, while Bob can feasibly still solve the single puzzle. One example of a Merkle puzzle is AES-encrypting the session key (128-bit) and the serial number (128-bit) (the serial number is appended to the session key twice, so we can check if the puzzle was solved correctly), with a randomly selected 40-bit string as the secret key. Then we can solve the puzzle by exhaustive key search in 2^{40} operations. More generally, we want each communicating party C to generate a key pair consisting of a public key P_C and a private key S_C, such that it's hard to recover$S_AC from P_C, then publishes P_C while keeping S_C secret.

If A wants to communicate confidentially with B, then A obtains an authentic copy of P_B, then sends B the message encrypted with P_B. Bob then decrypts the message using S_B, which only B knows. Since public key cryptography is so slow, we usually implement this in practice by using public key cryptography to communicate a session key (which is a fixed-length, relatively short value), and then using symmetric cryptography to encrypt the message with that session key.

If A wants to sign a message, A just signs the message with S_A, which only A knows. To verify the signature on the message, we obtain an authentic copy of P_A and verify the signature using P_A. Since public key cryptography is so slow, we usually implement this in practice by hashing the message, and then using public key cryptography to sign the hash of the message (which is fixed-length, relatively short value).

## Algorithmic Number Theory

Recall the fundamental theorem of arithmatic - every integer n \ge 2 has a unique prime factorization.

Just because it exists though, doesn't mean that prime factorization is efficient to find - as of this writing, we are not aware of any way to find the prime factorization in polynomial time - we need to do around O(\sqrt{n}) operations, while the length of the input is around O(\log n) (the number of bits needed to write the input down in a reasonable encoding). This is still an open problem - whether we can factor numbers in polynomial time with respect to the length of the input.

However, it's quite simple to verify whether a given prime factorization of a number is correct - just multiply them together.

In this course, by running time we mean the worst case time complexity. and by efficient we mean polynomial time. We also allow average case time complexity in some cases, especially when we look at probabilistic algorithms.

Suppose we have two integers a, b. Clearly, they have lengths O(\log a) and O(\log b) respectively. Since we usually look at length rather than the numerical value itself, we represent the length as k_a, k_b instead - the number of bits in each number.

Clearly, computing a + b takes O(k_a + k_b) - we always have to at least look at every bit of a and b to get the correct answer, and there are k_a + k_b bits total. Likewise, computing a - b is also O(k_a + k_b). We often assume without loss of generality that k_a = k_b, and just call it k

Multiplication can be done naively in O(k^2), and modern algorithms can do it in as little as O(k^{\log_2 3}) or k^{k \log k (\log \log k)} - it's still an open problem what the optimal value is.

For division, we can naively just repeatedly subtract the shifted version of the denominator from the numberator, zeroing one bit of the numerator each time. That means with the naive algorothm, we need to perform k subtractions, giving us an total time of O(k^2).

The GCD algorithm \gcd(a, 0) = 0, \gcd(a, b) = \gcd(b, a \mod b) continuously finds the remainder of dividing a number by another. Both numbers are positive and the second is strictly decreasing at each step. In fact, we can prove that the values are halved or lower after every two steps.

Since division/remainder takes O(k^2), and we're doing, in the worst case, k steps, we might expect that GCD takes O(k^3). However, GCD actually just takes O(k^2) time, because the values we're dividing are rapidly decreasing.

Let's say we're now doing modular arithmetic, so with the modulus n, all values are within [0, \ldots, n - 1]. Now we can compute a + b \pmod n by computing a + b and subtracting n if a + b \ge n, and we can compute a - b \pmod n by computing a - b and adding n if a - b < 0. That means modular addition and subtraction are both O(k).

Multiplication in modular arithmetic can be done by multiplying normally, and then performing a division by n and finding the remainder - it takes O(k^2). Same goes for finding the inverse, \frac 1 a \mod n

Turns out, exponentiation can be done in O(k^3) time - we can compute a^b \pmod n in cubic time with respect to the sizes of a and b!

## 10/2/17

The naive way to compute a^m \mod n is to compute a^m, then take that number mod n. This potentially gets us a really large result, and is somewhat wasteful. Another slightly improved naive method is to let , then run m times. This solves the issue of having really large intermediate results, but isn't polynomial time and is pretty inefficient.

An improved method is to use the squaring method, which decomposes m into a sum of powers of 2, then successively squares a and multiplies the result for each power:

Since there are at most k modular squarings and k multiplications, each of which is O(k^2), the worst case time is O(k^3).

Recall Fermat's Little Theorem, which says, among other things, that if a and n are coprime, and a^{n - 1} \mod n \ne 1, then we know n is composite (if it is 1 though, we don't know if it's prime or not). This is a really useful primality test - we just need to choose the right a to prove that n is not prime.

## RSA

RSA was invented by Rivest, Shamir, and Adleman in 1977.

To generate a key for a user U:

1. U randomly selects 2 large distinct primes p, q of the same bit length k.
• p and q have to be large so their product n is hard to factor. A good value of k is 1024, 2048, or 4096.
• p and q have to be distinct so n can't just be factored by square rooting the product.
• p and q have to be of the same bit length because there are algorithms for factoring n that take time proportional to the smaller prime factor of n, so we do this to maximize the hardness of factoring n against these algorithms.
2. U computes n = pq and \phi(n) = (p - 1)(q - 1).
3. U selects an arbitrary e such that 1 < e < \phi(n) and e is coprime with \phi(n). A common value of e is 3 - we can just ensure e is coprime by ensuring it's prime.
4. U computes d such that ed \equiv 1 \pmod{\phi(n)} and 1 < d < \phi(n).
• We note that if ed \equiv 1 \pmod{\phi(n)}, then ed + k \phi(n) = 1 for some integer k.
• To solve for d: the extended Euclidean algorithm can compute the GCD of e and \phi(n), and in the process find values for d and k that satisfy ed + k \phi(n) = 1.
• Recall the extended Euclidean algorithm: Initialize x_1 = 1, y_1 = 0, r_1 = e and x_2 = 0, y_2 = 1, r_2 = \phi(n). Let q_n = \floor{\frac{r_{n - 2}}{r_{n - 1}}} and r_n = r_{n - 2} - q_n r_{n - 1}, x_n = x_{n - 2} - q_n x_{n - 1}, y_n = y_{n - 2} - q_n y_{n - 1}. Stop when r_{n + 1} = 0 and gcd(e, \phi(n)) = r_n and d = x_n and k = y_n.
• Since e is prime, the GCD must be 1. If the GCD is 1, then d is guaranteed to be the multiplicative inverse of e, as required.
5. U publishes the public key \tup{n, e}, and keeps the secret key d to themselves.

To send an encrypted message from user U to user V (note that real implementations also need to handle things like side channel attacks):

1. U obtains an authentic copy of V's public key \tup{n_V, e_V}.
2. U represents the message as 0 \le m \le n_V - 1 such that \gcd(m, n_V) = 1 - the message is represented as a single, fixed-size number that is not one of the primes p, q.
3. U computes c = m^e_V \mod n_V.
4. U sends c to V.
5. V computes m = c^d_V \mod n_V. Note that no part of this requires U's public or private keys.

Decryption is essentially computing (m^e)^d = m \mod n. Finding eth roots of a number mod n requires us to factor n, which is hard. The hardness of solving for m is the basis for RSA's security.

Suppose p \mid m. Then m \equiv 0 \pmod p, so m^{ed} \equiv 0 \pmod p, so (m^e)^d = m \pmod p.

Suppose p \nmid m. Since ed \equiv 1 \pmod{\phi(n)}, ed = 1 + k\phi(n) = 1 + k(p - 1)(q - 1) for some k. By Fermat's Little Theorem, m^{p - 1} \equiv 1 \pmod p, so taking both sides to the power of k(q - 1), we get m^{k(p - 1)(q - 1)} \equiv 1^{k(q - 1)} \pmod p. Multiplying both sides by m, we get m^{1 + k(p - 1)(q - 1)} \equiv m \pmod p, or m^{ed} \equiv m \pmod p. By a similar argument, m^{ed} \equiv 1 \pmod q. Since p and q are coprime (since they're distinct primes), m^{ed} \equiv m \pmod{pq}, so m^{ed} \equiv m \pmod n.

An implementation of the extended Euclidean algorithm:

## 13/2/16

The obvious attack on RSA is to factor n to get p, q, and then use the extended Euclidean algorithm to get d - we just need to compute the private key. After decades of research, however, the consensus is that it's a hard problem.

In other words, we can define the basic RSA attack as "given \tup{n, e}, find p, q by factoring". Alternatively, "given \tup{n, e}, compute d". Note that the compute-d attack is at least as hard as factor-n attack, because if we have p, q, we can easily compute d, but if we have d, it's not particularly easy to get p, q. More formally, there's a turing reduction from every compute-d problem to a corresponding factor-n problem - if we can factor n quickly, we can compute d quickly, by using the oracle to factor n to get p, q, then compute \phi(n) = (p - 1)(q - 1) and ed \equiv 1 \pmod{\phi(n)}.

Also, it turns out there's a turing reduction from every factor-n attack to a compute-d problem. Therefore, since factor-n and compute-d both turing reduce to each other, they are polytime equivalent (denoted A_1 = A_2, where A_1 is factor-n and A_2 is compute-d). Knowing this is useful because factoring is a very well studied problem, and we've spent centuries convincing ourselves that it's a difficult problem.

The RSA problem is defined as: given \tup{n, e} and 0 \le c \le n - 1, find 0 \le m \le n - 1 such that c \equiv m^e \pmod n. In other words, we're trying to find the eth root of c under mod n. It turns out that the RSA problem turing reduces to factor-n, so it's no harder than factoring. ;wip: why?

However, we'd really like the factor-n problem to turing reduce to the RSA problem, because that would mean the RSA problem is at least as hard as factoring. However, this is an open question, and nobody has managed to prove it yet. That said, we generally just assume the RSA problem is hard, because we want that to be the case and nobody'd solved it yet.

For RSA by itself, a dictionary attack is posible - if the plaintext space is small and predictable enough, the attacker can precompute every possible plaintext/ciphertext pairs. To prevent this, we can append/prepend a random salt (e.g., a random 128-bit string) to the plaintext before encrypting it, then remove it after decrypting the ciphertext (e.g., by trimming it off of the decryption result to get the plaintext). That means the plaintext space is no longer small and predictable.

For RSA by itself, a chosen plaintext attack is also possible - given c and a decryption oracle f(c') = m' for any c' \ne c (we can decrypt any ciphertext except the one we're given), we can compute m from c. How does this work?

Suppose without loss of generality that \gcd(c, n) = 1 (because if \gcd(c, n) \ne 1, then it must be a factor of n, so we could easily find p, q and compute d).

Now, select an 2 \le x \le n - 1 such that \gcd(x, n) = 1 (again, if the GCD wasn't 1 then we've factored n

selection, choice, option, appointment

giving, issuing, grant, distribution