So in unit one, we learned about symmetric ciphers. In unit two, we're going to learn about how to apply symmetric ciphers to solve problems.
For the rest of this course we are mostly going to view ciphers as black boxes. They provide two functions: encrypt and decrypt. Encrypt takes in the message from some message base and produces a cipher text and it also takes in a key from some key space. Decrypt is the inverse, it takes in a cypher text and a message, and if it takes in the same key, it will produce the same original message.
Our correctness property is that the decryption with the same key of a message encrypted with that key gives us the same message. For the rest of this unit we are going to assume that we have some function that provides encryption and decryption. AES is a good choice for most applications. We're going to look at how to use these functions to solve problems.
The first thing we're going to look at though is how we actually get this key. All of our assumptions about security depend on this key, and we have two important assumptions.
The first is that the "k" is selected randomly and uniformly. This means each key is equally likely to be selected, and there's no predictability about what the key is. The other big assumption about the key is that it can be kept secret. The adversary can't learn the key, but it can be shared between the two end points. This is a big challenge. We're not going to look at that yet this unit, we'll get to that later. This is the big problem of key distribution which we'll talk about starting in unit 3. Now I want to talk about this first problem though - what we need to select a random key.
If we're going to generate random keys, we need some understanding of what randomness is. This is a very deep philosophical question, and I'm not much of a philosopher, so let's try a quiz instead. The question is, which of these sequences is the most random? So here are the sequences.
I won't try to read them, but see if you can figure out which one of these is the most random. I should warn you in advance this is a bit of a trick question, if you haven't already decided that.
The correct answer--other than refusing to answer the question, which would be a very correct way to answer such a question-- the correct answer is the third choice. There are some ways to know which ones are not correct. I'll show you soon how each of these sequences was generated.
Well, if you look at the first one, you see there's a sequence of 0-1-1-0-1-1 that keeps repeating. Anything with a repeated pattern like that is definitely not random. Now, the caveat is certainly if you generated a long enough random sequence, eventually it would contain this pattern inside.
The other one that is probably the trickiest one-- that when humans generate random sequences they look sort of like this. The property that this sequence has is there's never more than three repetitions in a row. When humans are asked to generate random sequences, they usually don' t have long runs of repetition, but in real random sequences there are. You can see there's a sequence of five 0s here. There's a sequence of four 1s. There's lots of repetition. In this there's never more than two in a row, so this is not random.
The other two look potentially random. You'd have to analyze them a little more carefully. This one has more structure. This one is actually random. Let me show you--and I have to say, this one is actually close to random. Let me show you how each of these was generated. Here's how I generated each of those sequences. So the first sequence was generated by generate_sequence. I'll show you what generate_sequence does next. But for each position in the sequence, the procedure that generates the output generates 0 if the position is divisible by 3, otherwise it generates 1, and we do that for the length of the sequence, which is defined as 88. Display-bits turns a list of bits into a string. All this code will be available linked from the course. So that is what generated the first sequence. The code for generate_sequence is here. It just maps the function onto the range from 0 to n -1. So for that length, we're seeing the output of the function at every position. So that's the first one--and clearly non-random.
The second one is taking a string, converting it into bits, and displaying those bits. The string is this great quote from John von Neumann.,and what the quote says is that "anyone who considers arithmetical methods of producing random digits is, of course, in a state of sin." But, of course, we do it anyway. So definitely, that's not random. It's generated from a very specific string. If you knew the string and you knew what these procedures do, you could generate that sequence, and the sequence that it generates is the one that as the second choice here. It looks sort of random, depending on how carefully you look at the patterns, but it's definitely not random, because it comes from this quote from John von Neumann.
The third one is actually designed to be random. So we're using the Random Choice Method. That's provided in the PyCrypto library that I'm using to generate random numbers. It claims to generate fairly good cryptographic random numbers. So this third sequence at least is generated in a way that is designed to be random.
The fourth one is generated using this procedure called generate_fake_random, which starts by making random choices, but is designed to trick humans by not having repetitions greater than 2, and I should change my comment. I had originally done it with >3, and that looks to most humans very random. I know it was a trick question, but I thought it would be a little too tricky-- including >3, but this is a good test for humans. One of the experiments people sometimes do is ask humans to come up with a sequence of random numbers-- that they think are random--from their own head, and they come up with sequences that are easily recognizable as non-random because they don't like to have too much repetition.
The previous question assumes some notion of randomness, but that is a very fuzzy concept to define. What I think is the most useful definition of randomness is the one that Andrey Kolmogorov came up with, which is based on Kolmogorov Complexity. This is an idea that was developed by the Russian Andrey Kolmogorov as well as by the Argentinian Gregory Chaitin. The idea is a way of measuring the complexity of some sequence. It's defined as the length of the shortest possible description of that sequence. For this to be well-defined, we need to understand more precisely what a description is. One way to do this would be to say it's a Turning Machine and decide on some formal way for writing down a Turning Machine, but we could use any method of describing algorithms that we want. It could be a Python program. Whatever we select as our description language, Kolmogorov Complexity is well-defined, and as long as that description language is powerful enough to describe any algorithm, it's a reasonable way to define complexity, and we can use that to define a notion of randomness. Some sequence is random if the shortest way to describe that sequence is the length of the sequence plus some constant. That means as we make the sequence longer, the description gets longer at the same rate. This matches the notion of randomness that we used informally in the quiz. Then if there's a short program that can produce the sequence, that means it's not random. It has some structure to it, and the program shows us what that structure is. If there isn't any short program that can describe that sequence, well, that's an indication that the sequence is random-- that there's no simpler way to understand that sequence other than to see the whole sequence. So this seems like a useful notion for understanding randomness. We're going to ask a quiz to see if it seems like a useful notion form measuring randomness.
So the question is for a given sequence S, is there a way to compute the Kolmogorov Complexity of S? The choices are that we can. That is the length of S plus some constant. The second choice is, yes, we can. It's impractical to do this, but here's an algorithm for doing it. We can start by initializing n to 1. We can have a loop that keeps going until we find the correct value of n, which is the Kolmogorov Complexity of the string S. And we're going to do that by looping through all the programs of length n. This is a big step, but it's finite, so, in theory, that would eventually finish, and for each program, we're going to execute that program and see if it produces S as its output. If it does, that's the result that we want. And the third choice is no, it's theoretically impossible.
The answer is that it's theoretically impossible to compute this for an arbitrary sequence. The first answer is not correct. If S is truly random, this would be correct. But if S is not truly random, there might be some shorter way to compute it. So this gives the maximum value of the Kolmogorov Complexity of sequence length S. And an example of such a program would just be this Python program, that would be the program "print" + S. That would print out the sequence S. It's length would be the length of S plus the 5 characters for the print plus 1 for the space. But that doesn't prove that there isn't a shorter program that can produce S. The trick in the second answer is this part. This assumes that we can execute P to completion. In order to do that, we would need to solve the Halting Problem. We would need to know that P will either never finish or know how long to wait to keep running P. So we can't actually do this step. This might run forever; this loop may never finish. So we would never learn the result. So that's why this doesn't work. This is not enough to prove that it's impossible, because maybe there's some other way to do it, and I'm not going to go through how we actually prove that the Kolmogorov Complexity is uncomputable, but I'm going to show you a paradox that will give you a little bit of an idea why that is.
The paradox is known as the Berry Paradox. What the paradox asks is, "What is the smallest natural number--" and by that I mean the set of numbers 1, 2, & 3-- sometimes we'll design it starting from 0-- the paradox would work just as well either way-- "that cannot be described in eleven words?" So this seems like a reasonable question. We can define the set of numbers that cannot be described in eleven words. It seems like a perfectly reasonable description of a set. It's also a case that any set of natural numbers has the smallest element. The numbers are well-ordered, so whatever that set is, there must be some smallest number. So that means there should be an answer to this question-- that there is a smallest number that cannot be described in eleven words, and I'm going to describe it for you. Here's my description...and my description has 1-2-3-10-11 words. So I just described the smallest natural number that cannot be described in eleven words, but I've described it in eleven words. So that suggests that no such number exists, but that contradicts the properties we had before. That's why this is a paradox. This is certainly not a proof that the Kolmogorov Complexity is undecidable, but it should give you some sense of how we might prove that.
So, Kolmogorov Complexity gives us an interesting way to think about randomness. It's certainly a useful notion in many areas of computer science, but, it's not going to give us a practical way to tell if the sequence is random. It won't even give us a theoretical way to do that because it's uncomputable. The next thing we might think about trying is statistical tests. This, in fact, is often used as part of an argument that something produces random numbers. It should be part of that argument, but it can never be the whole argument. The only thing we can do with a statistical test is show that something is non-random. We can't ever prove that it is random. We can always find some non-random sequence that satisfies all of our statistical tests. So, we're left with our best argument is how the sequence is actually generated. If we know how the sequence is produced, we can have some confidence that it satisfies the randomness requirements we have, at least for generating a good key, and the real requirement there is unpredictability. We want to know if our sequence is a sequence of numbers like this; where each xi is in the range of 0 to 2 to the n-1. So that even an attacker who's seen the first m numbers in the sequence can only guess the next one with the same probability they would have if they were just guessing by chance, only 1 out of 2 times 2 to the n, the size of the elements in the sequence. So, that's the property we want. If we're going to get that property--and we can't do it by using Kolmogorov complexity because that's uncomputable and we can't do it with statistical tests because they can only show non-randomness-- the way we're going to try to achieve that is by generating the sequence in a way that produces values that are unpredictable to the attacker.
Quantum Mechanics provides a notion of random events --that there are events in the universe that are inherently random-- and we can count things like radioactive decay with a Geiger counter and use that to generate randomness from physical events. Thermal noise is an easier thing to measure in most circumstances. If you can measure that precisely enough--it also depends on Quantum Mechanics-- at some level and produces randomness, and many modern processors have a way of generating a small amount of randomness by measuring thermal noise in the processor. Whether it's really physically random depends on a lot of other things.
You can also look at things that actually happen, and think that they are random. Maybe if they're key presses or user actions-- maybe those are random. An example of this is when we generate a new key using GPG, it will ask you to generate --when you start to generate a key--it says we need lots of random bytes and you can perform some type of action like moving the mouse using the disc to help generate more randomness for it. Humans aren't good at doing random stuff ,when we move the mouse, we're probably moving it in a pattern. When we type on the keyboard, maybe we're doing things that are not very random. So unless you're generating your randomness from quantum physics, there's always some question whether it's really random enough, or whether you can predict the particular motions I made. Certainly, given that this has been recorded and released, the fake key that I generated for Alyssa B. Hacker should not be used for any secure purpose. So this approach of waiting for physically random events is OK for GPG, maybe, because someone using it is probably patient enough to sit around for a while, doing random stuff as well as a human can to generate a key.
This would not work very well when you need more randomness more quickly. This happens every time you do a web transaction. Every time someone does a secure web session, any time you see the lock key in your browser, there's a protocol going on called TLS. We'll talk about that more in a later unit. But one thing that that requires is a new random key for each secure web session. I don't think many people would tolerate being asked to move around their mouse and do strange things to generate enough randomness in the hopes that that key is secure every time you connect to a website. So we need something better. We need a way to take a little bit of physical randomness, and that's usually known as the seed-- that's the initial state, and that's the input to what's known as a pseudo-random number generator. And that produces a long sequence--that is longer than the amount of physical randomness we started with--of random bits. So that's our goal--to take a small amount of physical randomness --some source of entropy that we can use as a seed-- have some function that will compute from that seed a long sequence of apparently random bits.
So for this quiz, I want to see if you can figure out a sensible way to do this. So we've got some pool of randomness... and one approach would be to extract a seed from that pool, use our Encryption function, we'll make the Seed the Message, we'll select the Key as Zero, we'll get an output from that, and we'll make a copy of the output. That's going to be our first random value. Then we'll encrypt again, using the previous Output as the new Input, using the key Zero again and extracting the next random number. So the second option: we're going to extract the seed from our random pool, we'll use that as the key to encrypt and for the first random output, we'll get the result of encrypting Zero with that key. For the next one, we'll get the result of encrypting One with that key. We'll keep using the same key, encrypting and getting a new random value as our output.
For our third option, we'll extract new values from the random pool each time, we'll encrypt a message, selected from the random pool, with the key selected from the random pool, to get X_0. To get the next random number, we'll extract a new message from the random pool, and a new key, and do that encryption. So which one of these options makes the most sense for a pseudo-random number generator? And let's assume our encryption function behaves well--it provides a mapping between keys and message that's hard to invert if you don't have the key, and that we have a random pool that provides a limited, but good, source of randomness.
One way to avoid some of those problems is to use Cipher Block Chaining. The idea here is that we use the ciphertext from the previous block to impact the next block.
So here's what this looks like. So we're still going to break our message into blocks. So this is the idea of Cipher Block Chaining. We're going to take each message block, encrypt it, with our encryption function, although let's assume it's still AES, using the same key, so we're using the same key for each block. We're going to get as output a cipher text.
Instead of doing each block independently, though, and having the codebook property, for the second block, we're going to take the ciphertext that came out for the first block, XOR that with the message block, and then make that the input term of the encryption function. So this keeps going.
This means, as a result, in CBC or Cipher Block Chaining mode, the (i)th encrypted block is the result of encrypting the XOR of the (i)th message block with the (i-1)th encryption block.
We have a little bit of an issue with the first one. The first one, well there's no 0th block. If we just did what was shown here, well then we'd still have a problem that we would see repetition every time the first block in a file is the same as the first block in another file, encrypted with the same key. We'd get the same C_0 out.
So we don't want that. We're going to add what's called an "initialization vector," and we'll XOR that with the first message. That keeps things consistent — each message is being XORed with something before it is encrypted — and this might worry us — that we're adding more secrets — we want to minimize the number of secrets — to be as few as possible — the IV doesn't really need to be kept secret.
Note that we're still encrypting this output. It's helpful to not reuse an IV, though. So it's OK to make the IV something unsecret, as long as it's not always the same.
So the question is: suppose that Alice forgets the value that she used for the initialization vector--she encrypted some file, she has the ciphertext and she has the key. Can she still recover n, even though she forgot the IV?
"No," she can't recover any of them;
Almost," that she can recover all of them except for the very first block;
"Almost," that she can recover all of them except for the first and the second block; or that she can only recover the very last block of n.
The answer is the second choice--that you can recover almost the full message-- everything except for the very first block. The point of the initialization vector is just to hide repetition among encryptions that would appear just looking at the first block. And the reason for this--we can look at how the encryption mode behaves-- We saw that for all of the blocks except for the first one, the value of C_i is the encryption of the value m_i--include my key there--and we saw for the way the encryption mode works, C_i is equal to the encryption using the key K of M_i XOR C_(i -1). The exception to that is block C_0. Where that's the value of encrypting m_0 XORed with IV. So we didn't explain how to do decryption. But from the way the encryption was, you should be able to figure that out.
We can look at this backwards--so in order to get the last message block-- well, what we need to do is decrypt using key K, and input to decrypt is this last ciphertext block. So we're going backwards-- we're decrypting. We don't have the message block yet. To get the message block, We need to do the XOR to get the message block N - 1 and so that means we're XORing that with the ciphertext value of the previous block, which we already have. Remember we have--to decrypt, we start wtih all the ciphertext blocks. So this is how we decrypted the last block, but each block is the same. To get message block i, we need to decrypt ciphertext block i, and XOR that with the previous ciphertext block. So we can do that for all the blocks, except for--we have this exception for the last one. The encryption for the last one used this IV-- to get the last message block, what we need to do is decrypt the last ciphertext block--or the first ciphertext block--we're going backwards now, and then XOR that result with the IV. So if we lose the IV but don't lose the key, and don't lose the ciphertext, we've lost just the first block. If the IV was selected perfectly at random, well, we have no information at all about the first block. Because whatever we get out of this decryption is XORed with that IV to get the message. So if we have no information about the IV, we have no information about the first message block. But we can decrypt all the other blocks.
The next mode of operation I want to talk about is called CTR mode, which stands for "Counter."
We have a message divided into blocks, just as before. We'll have our encryption function, just as before, and we can think of it as AES or any other block cipher that takes a key as input--we'll use the same key--but instead of just having a message block go in, what we're going to do instead is have a counter: some value that cycles through the natural numbers. That's going to be our input message, so we'll get, out of that, some cipher text.
What we're going to do now--well, so far none of this has anything to do with the message. Right--we've just encrypted the counting values from Zero to n - 1. What we're going to do is XOR the outputs, here-- with the message, so the message box goes into these XORs and what comes out is the cipher text box. If we did it just like this, we wouldn't have quite as much security as we would like. We'd be vulnerable to attacks that search the space of counters. We'd also be vulnerable because we're using the same sequence of counters for every file that we encrypt with the same key. So the solution to this is similar to what we did with the initialization vectors in the previous mode. What we're going to do is add a "nonce". We'll append the nonce with the countervalue.
A nonce is very similar to a key. A nonce is a one-time, unpredictable value. Unlike a key, it doesn't need to be kept secret. The point of the nonce is to make sure every time we use counter mode with the same key we get different blocks out. So as an example with AES, if we have 128 bits as the block size, we might have a 64-bit nonce and a 64-bit counter.
So let me summarize these two modes. So we saw Cipher Block Chaining mode, and we saw Counter mode, and for CBC mode--the "i"th block of the ciphertext, is a result of encrypting using the key. The "i"th block of the message with the previous ciphertext block-- and we need a slightly special case for zero, which would use, instead of the -1 ciphertext block, which doesn't exist, would use an initialization vector. With Counter mode, the "i"th ciphertext block is the result of encrypting the value of "i"--that's our counter-- with some nonce, and I'm writing this as concatenation-- so we have 64 bits here pasted next to those 64 bits for the counter and the nonce, and that is XORed with the corresponding message block. To do decryption with Counter mode, well the "i"th message block is the "i"th ciphertext block, XORed with this same value, which, as we know the key, we can decrypt. So that's how decryption is done. With CBC mode, the "i"th message block is a result of decrypting the "i"th ciphertext block and XORing that with the previous ciphertext block, or in the case of the very first block, XORing that with the IV.
So now it's time for a quiz to see if you understand the differences between CBC mode and Counter mode. So the question is: If Alice wants to quickly encrypt a large file and she's got a multi-core machine, or she's got access to a large distributed computing resource, which of these two modes will allow her to encrypt the file more quickly?
The choices are:
that it doesn't matter;
that CBC mode would be faster;
or that CTR mode would be faster.
The answer is the Counter mode would be faster. The reason for this is with CBC mode, there's a data dependency. Then we can't start doing this encryption until we know the value of the previous ciphertext block so we can't run all these encryptions in parallel. That's one major drawback of CBC mode. With Counter mode, well, we can actually compute all these encryptions before we even know the message. These values and the expensive operation here is not the XORs, XORs are very cheap. Encryption is expensive. So we could actually pre-compute, for a given nonce, all these encrypted counters, And then once we know the contents of the file, we can do those XORs, also in parallel, but those are very quick. So all of these encrypted blocks could be computed in parallel using many processors. So that's a big advantage of Counter mode over CBC mode.
The last mode of operation I'm going to talk about is called Cipher Feedback Mode — also known as CFB — There are many different modes of operation. We won't talk about them all. But the ones that I've talked about will give you a good sense of the modes of operation work. This one's a bit different from the others that we've seen so far. It does use some encryption function as a black box, like the others,
We'll call the input to that encryption the X-values. So X_0, X_1, X_2... are the inputs of successive encryptions. So the first one will be an initialization vector Similarly to how we've used that in other modes of operation. And that would be in the input to encrypt. There's also a key. The output of encrypt is some encrypted block. We'll use n as the encryption size. So for AES, we'll assume n is 128 bits. Whatever the block size is — so that size of input block and the size of output from encrypt is 128 bits. This cipher has an additional parameter, which we'll call S, and S is how we're going to divide the output. We're going to take the first S bits of the output and those will be XORed with that message b lock producing the output cipher text. This looks very similar to CFB, except for we haven't used the entire output here The other thing that we're going to do is we're going to use the cipher text to update the X-value — so we're going to take these S-bits we're going to put them into the next X-value and we're going to move the old value S-bits to the left. So that means we're going to be taking the n - s bits that are the right part of the previous value of X_0 and we're going to be moving those into here.
Everything else proceeds the same way, we encrypt the X-value — we get our output block--we take the left-most S bits of it we XOR that with a message, we get our ciphertext block. and this keeps going. So we can describe that process first we'll describe what happens with X-values So value X_i is the result of taking the previous value of X_i so that's value X_i - 1, so the left value of X_i. Taking from position n - S to the end, I'm going to use Pythonic notation for this — we're taking from position S to the end of the previous value of X and we're concatenating that with the value of the previous ciphertext. So this is to find — as recurrence — we need to find the initial value and that was given by the IV — the Initialization Vector — so that's how we update those values — how we compute the ciphertext values — we compute the ciphertext values by taking the outputs here — that's the result of encrypting using key K — the X-value for that position, and we're going to take just positions up to S, and then we're going to XOR that with the message. So this is how we compute the ciphertext blocks in the Cipher Feedback Mode. The important thing that you should notice here is that there's this additional parameter S, and what S means is that size of the message block. And the value of S should be less than the value of N — that's the normal block size of the cipher--otherwise, we wouldn't have any input left — it would end up being a different mode.
Now we'll have a couple of quizzes to see that you understand Cipher Feedback Mode. The first question is, how does one decrypt a message encrypted using Cyper Feedback Mode?
Here are the choices:
The first choice is we have to go through the blocks backwards, XORing out the cipher text from the results of encrypting the X values, and we can compute the X values in reverse order like this.
The other choice is we do the decryption going forward, XORing out the cipher text with encrypted Xi values and updating them as we did in the encryption mode.
A third choice is the same as the second choice except instead of using the encryption function, we're using the decryption function, which is the inverse of the encryption function.
The answer is the second choice. The first choice doesn't make any sense at all. The difference between the second choice and the third choice is whether we use the encryption function or decryption. The reason that we're using encryption function just like we did in encrypting is because when we're decrypting we have these cipher text, and we're trying to get the messages, so we're just trying to go through these XORs in the reverse direction. That means we need the same exact thing that we had in encryption to XOR out, which is the result of encrypting these Xi values.
For this quiz, I'm going to list some properties, and you should check the box for the ones that are true for the given operation mode. So which modes require that E--the encrypt function that we've been using in the black box-- is invertible? Check this box if Cipher Block Chaining mode requires this, and check this box if Cipher Feedback Mode requires this. Which of them require the Initialization Vector to be kept secret? Can use small message blocks? Which can provide strong protection against tampering, so Alice would be able to know if someone had changed values of parts of her file? And the final one--that the final cipher text depends on all of the blocks in the message.
The answer is no, that it is actually impossible to construct one. This function must be deterministic, so it produces the same output for the same input. We want it to produce uniform distribution so that means it needs to add randomness to what comes in, but that's impossible. Since it's deterministic, the only randomness it could use is what's provided by x. So there's no way to amplify that to provide more randomness in the output. So, this is actually impossible. We're going to assume they exist anyway. And the assumption is that, even though it's impossible in theory to do this, in practice we can build hash functions that provide mappings that are computationally very difficult and indistinguishable from uniform by an adversary with a reasonable amount of computational resources.
So now we'll assume that we do have such a function. We'll assume that we have a function H that acts like a random oracle that provides the properties of an ideal cryptographic hash function. Let's try our coin-tossing protocol again. So, here's our new protocol design. Alice will pick a number, 0 or 1, representing heads or tails. She'll compute using our ideal cryptographic hash function-- the hash of x--and she'll send m to Bob. Bob will make a guess, send that guess back to Alice. Then Alice will send her claimed value of x back to Bob. Bob can check if the hash of x equals m. If that's not the case, then Bob suspects that Alice has cheated. If x is equal to g, Bob wins. So, do we like this protocol? We'll assume that H is an ideal hash function, So which one of the parties, if any--or both--can cheat with this new protocol that we've defined?
So the answer is that Bob can cheat, and the reason that Bob can cheat-- well, remember this hash function is not encryption. There are no secrets that go into it. It's only providing this commitment to the input, and so Bob can easily cheat in this case. He can compute by himself the hash of zero, the hash of one, check them, whether they equal m, and then instead of guessing, he can pick whichever one did. So, with this protocol Bob can always win. I'm going to leave it as an open discussion problem for you to figure out how to fix this protocol. And there are definitely lots of different ways to do this. I hope there will be some interesting discussion about it on the forums.
The next thing we're going to look at is how big does the hash output need to be to provide the properties we need? So first let's just consider a weak collision resistance. So the question is assuming that our attacker has just enough computing power to perform 2 to the 62 hashes, how many bits should our ideal hash function--and we're assuming now that we have an ideal hash function that provides a perfect uniform distribution on the outputs--to provide weak collision resistance? To formalize what we mean by providing that, let's assume that we're satisfied as long as the attacker's success probability of finding some pre-image that hashes to our value H is less than one-half. Of course, for most applications we want this probability to be much less than one-half. To answer, give the number of bits in the output of H.
So the answer is we need 63 bits to ensure this probability is less than one-half, and the way to think about this is assuming the uniform distribution of our ideal hash function, every time the probability that one guess maps to H is 1 over the number of bits, or 2 to the negative b. We'll use b to represent the number of bits. So with our random oracle model, for any given guess the probability that it hashes to a particular value is 2 to the negative b. The probability a guess is bad is 1 minus that. Over a series of guess, the probability that they're all bad is that raised to the number of guesses power, and we've said that k is equal to the number of guesses here, which we said was 2 to the 62, and we can solve 1 minus 2 to the negative 62 to the 2 to the 62. That's equal to approximately 0.63.
There are lots of ways to solve this. You could try computing this in Python. These numbers get pretty big. The easiest way to solve this is to just plug this intoWolfram-Alpha. There are also, certainly, mathematical things you could do to get this equation in a simpler form, but since we just want the answer, it's easier to just try plugging in some numbers. If we increase the number of bits by 1, that means the probability for each guess being bad increases to 1 minus 2 to the negative 63, and that turns out to be 0.39. That means 63 is the fewest number of bits to provide the attacker with less than a 50% chance of finding a pre-image that maps to the same hash value in 2 to the 62 guesses.
The key point is that this was for weak collision resistance. For weak collision resistance, our assumption, an attacker needs to do about 2 to the b work where b is the number of hash bits to have a good chance of finding a collision. Strong collision resistance is harder than weak. To obtain strong collision resistance is actually much harder, and we'll see that we need about twice as many bits for that-- that the attacker's effort is more like 2 to the b over 2, so we'll need about twice as many output bits in our hash function to provide this. The reason for this is because of what's known as the birthday paradox.
It's called a paradox. It's not really a paradox. It's a surprise to people who don't follow the mathematics on this, but there's nothing paradoxical about it. It's not like the Berry paradox that leads to a contradiction. The way this is usually framed is you have some group of k people-- let's say it's a kindergarten classroom-- and you want to know what the probability that 2 people have the same birthday. The reason this is called a paradox is the answer is actually much higher than most people expect. So let's compute this. We'll assume that there are no leap years and that birthdays are uniformly distributed. We assume the same property for our outputs of our ideal hash function. So we'll compute the complement probability that there are no duplicates, and then the probability that there is at least 1 duplicate is 1 minus this. And the way to think about that is we can go through the people in the class. We can assign some birthday to each one. So we'll assign b0 to the first one, b1 to the second one, keep assigning birthdays. There are lots of different ways we could do that. In order to assign birthdays with no duplicates, then there's a limited number of ways that we can pick any of 365 days for the first birthday. For the second one, if we want no duplicates, we can't use whatever day we pick for the first one. So it will be that times 364 times 363 and so forth, and that's the number of ways to assign birthdays with no duplication. We're trying to compute the probability, so we're going to divide that by the number of ways to assign with duplication, which is just 365 choices for each person. So in general, this first value is n factorial divided by n minus k factorial, and the bottom result is just n to the k. N is the number of possible days or the number of possible hash outputs. K is the number of trials. And so our probability that there are duplicates is just 1 minus this. This is what we need for strong collision resistance. It's higher than what we needed for weak collision resistance, which was 1 minus 1 minus 1 over n to the k. So to actually compute this, these numbers would get really big if we tried to actually compute the factorials. Instead we need to use approximations to do this. But to give you some idea what the results are like, if we have the duplicate birthday question where we have 365 days and 20 people, the probability that there's at least 1 duplicate exceeds 0.4.
If we're thinking about hash collisions, if we only had a 64-bit hash and our attacker was much weaker than the one we hypothesized-- let's say they can only do 2 to the 32--that's already almost a 40% chance of success. This success goes up quite quickly. If the attacker can do 2 to the 34 hashes, then the success probability is very close to 1. So the conclusion is that given an ideal hash function with N possible outputs, an attacker needs about N guesses to find an input that hashes to a particular value but only needs about the square root of N guesses to find a pair that collide. This assumes the attacker can store all those hash values as they try the guesses and compare it to all the previous ones. This is the reason why hash functions need to have large outputs. SHA-1, which was a widely used hash function (Secure Hash Algorithm), used 160 bits in its output. This was actually broken. There is a way to find collisions using SHA-1 with only 2 to the 51 operations-- much fewer than the 2 to the 80 that one would expect from this square root estimate, and that's because of mathematical weaknesses in the way SHA-1 works. SHA-2, the output can be either 256 or 512 bits. As long as there aren't mathematical weaknesses in the cipher, if it was a really ideal hash function, this would be big enough to defeat any realistic attacker, but there are beginning to be suggestions that there may be ways to break this. No practical breaks have been found yet. For SHA-3, the winner is expected to be announced later in 2012. So for the rest of this class we're going to assume that we do have an ideal hash function and that it has enough outputs to provide strong collision resistance against a motivated attacker.
The last thing we'll do this unit is look at an application that combines many of the things we've seen so far. Our goal is to figure out how to use passwords to do authentication. We're going to assume that we have some file or database to keep track of user and for each user we want to keep track of something that will allow us to authenticate that user. In UNIX, this file is stored in /etc/passwd.
So our goal is that even if some malicious insider or someone who can compromise our system can get access to the database or the password file, they won't be able to impersonate users. They won't be able to log in by just being able to read this file. If they can modify this file, that's another story. But let's assume that they can't modify it, they can just read it. This is not a completely unrealistic model. This is actually the model that motivated the design of the UNIX password system. There are lots of people that need to read this password file but only very privileged applications that need to write to it. So the really bad idea number 1 is to just keep the passwords in clear text. There actually are web applications that do that. Any web application that actually does that is doing some things very, very wrong. First of all, it's wrong to send a password in email, especially one that was created by a user, but it also means that they're actually storing your password in a way that they can recover it. And that probably means that anyone else who has access to their database can also recover it. So we want to do something better. We want to provide this property that even if someone can get access to the database and read all the user and password information, they can't break into your account. So that was really bad idea number 0: keeping the passwords in clear text. Now we're going to move on to really bad idea number 1, which sounds a little bit more plausible but is not a good way to store passwords.
So here is the plan. The server operator will generate some key. Let's assume the server can generate a good random key. For each password we're going to store the encrypted value of that password, encrypting using the key using CFB mode. That's cipher feedback mode with message block size 8, so that will allow us to use this for any size string. That could be 7 or 8 depending on how we encode the characters.
The question is, why is this a really bad idea? Here are all the possible choices, Select all the ones that apply:
That it reveals the length of the password;
that the encrypted text reveals when 2 users have the same password;
that it requires use of E, an encryption function, which is slower than using a hash function;
that if the key is compromised, that reveals all of the passwords.
The answer is all except for the third are true. The third is not true because encryption is generally faster than hashing. These 3 are really big problems with this scheme. Let's talk about the fourth one first. We need this key to decrypt. The program that's running on the server that needs to check passwords will need this key all the time, so chances are if the password file is compromised, the key is also compromised because it's available in memory, it's stored in this program, or it's readily accessible because we need it every time we check a password. The solution to this is that we don't actually need to invert the password to check it's correct. All we need to do is recompute from the entered password some function and check that with what's stored. So there's no reason to have a key here.
The first reason is also true--that because we're encrypting the password, the output that's the stored password will reveal the length. The reason that's a bad idea, well, it reveals information about the password. It also helps an attacker out--tells them which passwords are short, and that helps the attacker know which ones to try and break. The easier way to fix this is to use a hash function as well. If we use a hash function, the size of output doesn't depend on the size of the input. So no matter how long the password is, we'll have the same number of output bits based on our hash function.
The second one is a little more subtle. If we just use the hash function in a straightforward way, we won't save this problem. This problem is pretty important. If you look at typical passwords, many are the same. The most popular password, at least according to an analysis of leaked passwords, is 123456-- if that's your password, you should probably change it-- out of the 32 million passwords that were leaked. So an analysis of the 32 million passwords that were leaked by RockYou.com-- and they do not store passwords in an encrypted format, so all the actual passwords were easily revealed-- the most popular password was 123456. Over 290,000 people used that for their password. That's almost a 10th of a percent. So if this is your password, you should probably change it. Once you start looking at a set of a few thousand popular passwords, you're covering a large percentage of users. So this first approach fails on many accounts.
Now we'll look at a slightly better idea. This was actually what was done by early versions of UNIX. The idea was to store for each user the result of encrypting using the user's password as the key the value 0. So this means there's no key to keep secret on the server. Someone who compromises the password file knows that the value being encrypted is 0 but would need to test possible passwords to find the one that matches. To make this harder, you wouldn't just run encryption once; you would run it many times. If you encrypt twice, this doubles the work to check a password. It also doubles the attacker's work for each password to guess. Since the attacker has to guess many more passwords than just the one that was entered, this scales the attacker's work more than it scales the checking work, and the way this worked in UNIX, it was doing this 25 times.
So the value that was stored for each user was the result of encrypting using the password as the key, using 0 as the initial value, but looping it around 25 times. There are some problems with the way this was done in early UNIX systems. The encryption function was DES, the Data Encryption Standard, which used 56-bit keys, and the key was generated from the password by just cutting it off. Take the first 56 bits. So with the 56-bit key, the first 56 bits of the user's password are used. If this was encoded as 7 bit ASCII, that was only 8 letters. It was actually a little bit less than that because DES placed some restrictions on the key structure. So if users selected their passwords using only the uppercase, lowercase, and digits, this corresponds to 62 possibilities. Most users would not pick uniformly from those 62 possible characters, but if they did, there are 62 to the 8 possible passwords. This is less than 2 to the 48. For someone with modern computing resources, doing a search on a space like this is very feasible. The reality is even worse than this-- that a motivated attacker can pre-compute a dictionary, pre-compute this value, for a set of common words w, store those pre-computed values, and then every new password file that's compromised check all the passwords against this list and have a good likelihood of finding some accounts that can be broken into. So how can we make dictionary attacks harder?
One idea would be to train users to pick better passwords. Users are pretty hard to train. This usually means coerce instead--force users to pick a password that satisfies some properties. It could be a minimum length, a mix of upper and lowercase letters and special characters. Many websites do this. It usually just leads to annoyance and doesn't necessarily lead to better passwords. You can certainly come up with bad passwords that satisfy any set of rules that someone forces you to use. So another solution would be to protect the encrypted passwords to make sure that an attacker or an insider can't get access to them. This would certainly help, would prevent the dictionary attack, because they wouldn't have access to the encrypted passwords. They could still try a small number through the login interface. If the passwords are bad enough, the dictionary attacker will still succeed but only breaking a small number of accounts. The third approach is to add salt. This is the one I'll talk about next.
So with a salted password scheme what we'll store in the password file, we'll have the users, we'll have an extra column that we'll call the salt, and then we'll store the encrypted password. What the salt is is random bits. They don't need to be kept secret. For the UNIX scheme there were 12 random bits, and they're different for each user. That's why they're stored in the table.
For user Alice let's say we have the salt 011010001111, which is an apparently random sequence. What we'll store as the encrypted password is the result of hashing the salt concatenated with Alice's password, and there are different ways of doing this. Some hash functions can be modified to behave differently based on another parameter. But if it's a simple hash function that just takes an input, we can make the input the salt concatenated with the password, and then for Bob we'll do the same thing, but we'll have a different sequence of random bits.
Here what we're storing is the result of hashing Bob's salt, which is this selection of random bits, concatenated with Bob's password. This means as long as the salts are different, even if the passwords are the same, the encrypted passwords will be different. And this is very similar to what standard UNIX systems did where H is actually using DES encryption 25 times using the salt to modify how the encryption happens and the password as the key and the initial input as 0. So to see if you understand the impact of salting,
I have 2 quizzes. The question is, how much harder does adding salt make it for an attacker who compromises a password file who just wants to learn Alice's password?
The possible answers are:
not much harder at all;
about twice as hard as it would be without the salt;
and about 2 to the 12, which is 4096, times harder than it would be without the salt.
[The answer is not much harder at all. The attacker has compromised the password file. That contains the line giving the salt for Alice and her encrypted password. To break the password, the attacker needs to try possible passwords concatenated with that salt and find one that matches the hash value. This seems like the salting doesn't help us at all, but let's change the question a little bit.
So now the question is, how much harder does the addition of salt make it for an attacker who wants to carry out offline dictionary attacks? That means the attacker can pre-compute a dictionary of encrypted passwords and then for each collected password file wants to find any passwords that match the values found in that dictionary.
This helps a little bit but doesn't avoid many of the problems with passwords. One big problem with passwords is that they're reused. There are lots of ways to reuse passwords. Some of this is using the same password across many sites. That's not what I'm talking about here. I'm talking about the point that every time Alice logs in she enters the same exact password. The password is the same until she does something to change it. So she's typing the same password many times. This means if there is something logging what she types, it will learn her password if she types it in an Internet cafe or somewhere where it's visible to someone looking over her shoulder-- a shoulder surfer. It's also the case that her device that she enters her password in would start to have smudges where she types her password. This is a particular problem for short PINs on smartphones that are entered so many times that finger smudges start to give an idea of what the password might be. All of these problems stem from using the same password every time she logs in. So we're going to talk about one way to avoid that, which is to use a hash chain. Hash chains have lots of interesting applications. In this case, we're going to use a hash chain to make it so we still have the nice property that we had with the password file where the server stores no secrets. It doesn't matter if all the data on the server is compromised. That still wouldn't give someone the ability to log in as Alice.
Here's the idea of how to do this. We're going to start by selecting some secret, and then we're going to compute the hash of that secret, and then we're going to do that again. We're going to compute the hash that we got for the first time as the input, and we're going to keep computing hashes here. This could be done anywhere. I've written it on the server side. The important point at the end of this-- so what Alice gets is the hash of s, the hash of the hash of s, the hash of the hash of the hash of s and so forth. Of course, if she just has s, she could compute all of these herself. The server only stores 1 thing. The only thing the server stores is the last value in this hash chain. So let's suppose we did this 100 times. This is the value the server would store, and that's the only value stored by the server. So now the protocol to log in will be Alice will send the 99th value in this hash chain. So what the server will do is compute the hash of p, check if it's equal to the stored value x. If it is, it will allow Alice to log in and will also update the value of x. The new value of x will be the value Alice sent as p. So that's how the first login works. Now the question is, the next time Alice logs in, what does she need to send? Here's the choices. Different number of times of doing the hash starting from the secret s.
The answer is Alice needs to send the value of hashing s 98 times. The hash chain is going backwards. We can only verify hashes in 1 direction. The hash is hard to compute in 1 direction. That's the valuable property the hash function gives us, and so we have to go backwards if we want to use it for authentication. Here we're using it to authenticate Alice. If someone just knows x, if someone intercepts p, knows the previous password value, they could compute any of these other values. Those are easy to compute once you have p. This was p, this one is just computing the hash of p, and this one is computing the hash of the hash of p. The only one that would be hard to compute is this one. The server can check that that's correct using the same process. At this point, the value of x is hash 99 of s. So when the value of p that's sent is hash to the 98th power of s, doing hash 98 times, then this equation will be true only if the value sent was correct. So what I've described is what's known as the S/Key password system. The way S/Key would work, the server would generate the hash chain. Let's say there are 100 entries. Alice would print these out in a list, and they would be turned into strings that are easier to type than pure bit sequences. The server would store the last entry in that hash chain and nothing else. And so what's stored in the server could not be used to log in. The list that Alice has could be used, and I should correct this that if Alice starts with H 100 as the first thing in her list, what the server should actually store would be H 101. This has a pretty big downside--that it requires Alice to carry around with her a list of passwords, and instead of remembering something that becomes easier and easier to type, she'd have to look at that list, type the password correctly, cross that one off, and use the next one next time. At some point she's going to run out. She's going to need to get on a secure connection again to generate a new hash chain to be able to keep doing this.
[There's an interesting story about using hash chains for hotel doors. I'm not positive if this story is true, but I've heard it from fairly reputable sources and it seems too good not to be true. So if you think about a hotel, you've got rooms-- let's say this is Room 387--and you've got a check-in desk that may not be connected by any wire to the room. What you'd like to happen is when a new visitor checks in-- let's say it's Alice--the desk at the check-in can give her a new key, and that key will allow her to open the door. There's no connection between the check-in desk and the door, and we want to know that Alice will be able to enter the room. Whoever stayed in the room previously--let's say that was Bob-- shouldn't be able to enter the room anymore.
At the point after Alice checks in and enters the room, Bob should no longer be able to enter it. We'd like to have all this happen without needing any coordination between the desk and the door. One way to do that would be to use a hash chain. What the door will do is when it reads a card-- and I'll write this in sort of Python-ish code-- the door will have some stored value x. What the door will do is check if the value of hashing that key is equal to x, the stored value. If it is, the door will unlock and everything is good. We open the door with the correct key. There's another situation that we need to test for, and that's if the hash of the hash of the key is equal to x. If the hash of the hash of the key--if we're thinking about a hash chain, that would be the next key in the hash chain. If that's the case, we also want to unlock the door, but we want to update the value that's stored as x. We'll update that value to be the hash of the current key, hash of k. So that's our read protocol. What this means is that if Bob is using the room first, his key is k1, and his key will have the value some number of hashes on some secret, and the value of x would be m + 1 of those hashes. And Bob keeps opening the door, going through this path through the code. There wouldn't really be a Python interpreter in the door. This could be done by something much simpler than having a full Python interpreter. Then when it comes time for Alice to open the door, Alice's key would have the value 1 fewer hash of s. That means that the value of the hash of the hash of Alice's key is equal to this value, hashing m + 1 times starting from s, and that means Alice would open the door. It would change the value of x, so now it no longer works for Bob. But the next person who enters, it will work, and then it will stop working for Alice.
What the check-in desk needs to do, it needs to keep track of some secret, and it needs to keep track of the number of the guest. The initial value stored as x needs to be set, but once those are done, there's no coordination needed. Every time a new guest arrives, they'll be given a key, which is the result of hashing n times starting from s, and the value of n would be decreased by 1. This is our design for allowing secure doors with a sequence of users.
So to see that you understand it and understand hash chains well, we have a quiz. What would cause problems with this design? If a criminal could extract the stored value from the lock; if a guest checks in to the hotel but never actually enters her room; or if a guest enters the room many times.
The answer is the second choice. If a guest checks in but never enters the room, here's what goes wrong. If Alice never enters the room, then the next guest--who's very tall-- who checks in, she'll get a key that is the next value here, which is going to be hashing m - 2 times, but the value of the stored x is still hash of m + 1, and that means the value of hashing of the hashing this new key will only be the hash of m, which won't match the value of x. So the new guest won't be able to open the door. The hotel operator would need to do something to reset the value of n or resynchronize the door and the check-in, nd apparently, the hotels that adopted this way of doing keying never imagined the possibility that a guest would check in but never actually enter their room with their key. Apparently, this happens more often than you might think, though.
We saw how to use a symmetric cipher to take a small amount of random data and produce a sequence of values that appear to be pseudo-random. We also looked at the problem of how to encrypt a large file or a large message, and that brought us to look at modes of operation for using symmetric ciphers. We talked about the cipher block chaining mode, the counter mode, and the cipher feedback mode, all of which have different advantages and disadvantages.
We also looked at how to do fair coin tosses remotely, and that led us to the need for cryptographic hash functions. We saw how to use those to check user passwords in a way that doesn't require us to keep any secrets on the server, and we also saw how to use hash chains to make it so those passwords never needed to be reused.
The big problem we haven't addressed yet is how to establish a shared key. If we want to use a symmetric cipher to allow 2 parties to communicate, they have to agree upon the shared key beforehand. If you think about most ways we want to communicate today, that's very difficult. If you visit a secure website, you don't have a shared secret with them to begin with, but you want to start communicating securely with them despite not having a shared secret. And for most uses of cryptography today this is a really big problem. We can't assume that we have a shared secret with every website that we want to use. We need some way to establish a shared secret without having one to begin with. That's the main topic of Unit 3. I hope to see you there.