How many pairwise keys are needed for 100 people to be able to communicate directly with each other?
The answer is 4851, so suppose we have 100 people, and I'm not going to try and draw 100 people, but here we have N people.
but needs a shared key with everyone else. That is N-2 additional keys.
That means we need the summation from N-1 down to 1, or we can think of that going forward 1+2+...up to N-1, and that simplifies to N-1 times N-2 divided by 2.
For 100 people, this would be 99 times 98 divided by 2, which is 4851, which might not sound too bad, but 100 people is a pretty small number.
If we think of doing this for the billion or so people on the internet, then we would need around 10 to the 18 keys. This isn't really a solution, both because of the huge number of keys that are needed and the need to establish all of these pairwise keys. We haven't really solved our problem.
What could go wrong?
Trusty Keys can read all messages between A and B. Trusty Keys can impersonate any customer.
Under certain circumstances C could tamper with message (3) to steal
Merkle's Puzzles The basic idea is Alice is going to create many puzzles and then Bob will randomly pick one of the puzzles to solve, and part of solving that puzzle will give Bob the number of the puzzle and a secret. And what he'll send back to Alice is the number of the puzzle, and Alice will know the corresponding secret, and they'll use that as the key.
The reason that this gives you a way to establish a key is because Bob is randomly picking one puzzle to solve, whereas an attacker would have to solve many puzzles before he could get the one that Bob actually solved.
In order to create puzzles Alice creates a list of N secrets. Each one of these is a random number s bytes long. Then she creates a set of messages, also N messages. And each message will be encrypted using the encryption function that they agreed on using a key that's the corresponding secret concatenated with enough zeros to make the appropriate key length. If this was a 128-byte AES key, the secret size might be, say, 40 bytes, and then we would add zeros to fill out the rest of the key. And then what we'll encrypt using that key is a message that gives the identity of the puzzle, which will be that index i. And the message will give the identity of the puzzle, and we'll include a string before it so it's clear that that's the right message. If we just included a number, if there were enough numbers, we might accidentally decrypt one to that. That's what we encode for each message here. After this, Alice shuffles the messages, and then she sends all N messages to Bob. Bob randomly picks one of those, then does a brute-force key search where Bob is trying to decrypt using a guess key concatenated with N's zeros, the message that Bob randomly selected from those puzzles. Eventually he's going to find one of these decrypts to puzzle followed by a number. At this point Bob knows the guess, so he knows the secret that was used for that, and he knows the puzzle number. If we wanted a larger key, which we probably do, what we should include in the puzzle instead of just the i is a key (and this means the key can be of any length). This would be some new random key associated with that puzzle. Alice would keep track of those keys, and so for each puzzle, when Bob decrypts it, he acquires both the puzzle number and the key number, sends the number of the puzzle back to Alice, Alice can look up in her keys and figure out which key was the one in the puzzle Bob decrypted.
The answer is the first one.
The next important property is that it's actually secure, and we're going to focus first on secure against a passive eavesdropper, so this is an adversary that can listen in on all the messages but can't modify them. That means the adversary will be able to get the public values q and g. We'll also be able to hear all the messages transmitted, so we'll be able to learn yA and yB, and our goal is to be confident that the adversary would not be able to learn k AB.
The question is which of these would be most convincing in proving that the Diffie-Hellman protocol is secure?
The correct answer is the third choice.
What we'd really like to be able to show is that we can reduce some known hard problem to the Diffie-Hellman problem, and the important thing is that this would show that anyone who can solve the Diffie-Hellman problem would also be able to solve some problem that we already know is hard.
Showing that the problem is hard just shows that if we have a large enough instance of it, it can't be solved with a certain amount of computational resources, and we hope that by picking a large enough instance we can make the work for the attacker far beyond what any attacker could afford. That does depend on this property, so it depends on the number of values being high. That itself is not a useful proof of security. What we need to know is that there's no easier way to solve the problem than trying all the possible values.
The hard problem that is closely related to the Diffie-Hellman security property is the discrete log problem.
Discrete logs are like continuous logs but over a discrete group.
So continuous log if we have a to the x equals b, and we know a and b, we can solve for x. That's the log base a of b, and they're well know efficient ways to compute these logarithms. One of the earliest use of computers was to compute these tables of logarithms. With discrete numbers, this gets much more interesting. So now we have a to the x equals b, modulo sum value n, and our goal is to solve for x, which is the discrete log base a of b, and this turns out to be, as far as everyone can tell, a very hard problem when n is a large prime number.
It's not clear that discrete log always exists, and for certain choices of a, b, and n, it would not exists, but if we choose n as a large prime number and a as a generator, well then by definition, it must exist. What it means for our number to be a generator is that if we raise g to each power, what we get is the permutation of the numbers in the group Zn. So as a little demonstration, certainly not a proof, here's a code that produces the permutation for given some generator and some modulus, raises the generator to every power between 1 and the modulus minus 1. So we can try that with a fairly small prime number so you can see the results.
We'll use 277 as our prime number and 5 as a generator for 277. One could check that in a root force way just to show that it all produces all the numbers, and we'll see that in the output for generator permutation. These are the results and you can see the first 1 is 5, that's 5 to the 1; The next one is 71 because 5 to the 4 mod to 77 is 71, and if we look at all the numbers here, it would be a permutation on the numbers from 1 to 276. Other than the early ones, it would be fairly hard to predict where a number is in this sequence. You could certainly compute the whole sequence to find it. The question the discrete log is asking is given a number, can you figure out where it would be in this sequence or can you figure out the power that you need to raise the generator to find it, and the claim is that that's hard to do. Showing this sequence really is not enough to convince you that that's hard to do, and there's no proof that it's hard. The reason people believe it's hard is that many smart people have tried to find good ways of doing this, and none of the solutions rendered polynomial time that the fastest known solutions are exponential. That means essentially that the only way to solve this is to try all possible powers until you find the one that works. You can do a little better than that by trying powers in a clever way, and you can exclude some of the powers more quickly, but you can't do any better than doing this exponential search, which is exponential in the size of n so this is something we have to be careful about when we talk about hard problems.
When we say it's exponential, well it's not exponential in the value of n. It's linear in the value of n. We just need to try n operations, but the magnitude of n grows as 2 to the number of bits needed to break down n. So as long as that's the best solution to discrete log, then for very large n, it is intractable no matter how many computer resources you have, you can't do this exponential search.
You can't find the value of x that's the discrete log of b, base a, mod n. So as long as no one can find a fast way to solve the discrete log problem, as long as n is large and is an arbitrary instance of this problem, we think that it should be hard to compute x given a and b and the modulus. So for this quiz, we will assume that we have and advisory that's passive so all it can do is ease drop on the messages, but they also have access to a powerful computer resource, they have a procedure dlog that is a fast procedure for computing discrete logs that works on any inputs, and they have modular exponentiation, a fast procedure that outputs base to the power mod modules. And now the question is can they break a Diffie-Hellman key? So we're assuming that they're passive attackers, so they've eased dropped on all the messages between Alice and Bob, so they have all these values that were sent over the secure channel, and the possible answers are no that it's impossible with no more resources or information, or yes there is a way to do it, and here's the way that she would compute that.
The answer is the second choice. If it were possible to compute discrete logs quickly, and our adversary had a way to do that, well then, they could compute the discrete log of the intercepted value y-A, the g, and the q values, which were also intercepted. Those ones were public, and the result of this would be the x-A value that was Alice's secret, and then the adversary can compute the key the same way that Alice would. This was the only secret value, and if you had the discrete log function, you could compute that secret value.
The second answer doesn't compute the right thing. It's computing the straight log of y-B, which would be the y-A value. That's not the value that is necessary to compute the key.
The third value doesn't use discrete log. If we could actually compute the key this way, well then the protocol would be completely insecure because it turns out that modular exponentiation is a function that we can compute efficiently, and it's necessary that we can compute modular exponentiation efficiently because otherwise, the legitimate participates in the protocol wouldn't be able to determine their keys. So we will look at that soon.
Before doing that, I want to emphasize that this is not a proof that breaking discrete log is hard.
We need a faster test for primes. What we're going to use is a probabilistic test. That means if some number x passes the test, the probability that x's composite is less than some value. We're going to use probabilities like 2^(-128)^. This is certainly low enough for most uses in cryptography. One way to think about that is if the key size is 128 bits, we already have that probability that someone would randomly guess that key correctly. The main basis of probabilistic primality tests is properties of prime numbers. One useful theorem about prime numbers is Fermat's Little Theorem, which states that if p is prime, if we select some number a between 1 and p and raise a^(p-1)^ power--modulo p--the result must always be 1. Maybe we could try lots of a's. If this equation always holds that a^(p-1)^ is congruent to 1 mod p, that would mean that p is probably prime. The problem is that there are some composite numbers that are known as Carmichael numbers where this test also holds for most a values. Indeed, it holds for all the a values that are relatively prime with p. Unless we try all the a numbers between 1 and p there is a high probability that this will always hold. If we need to try them all, this test isn't fast enough. It's still going to be exponential in the size of p.
The good news is there is a faster test, which is known as the Rabin-Miller test. Sometimes it's known as the Miller-Rabin test. It was discovered independently by both Miller and Rabin. The main idea was originally proposed by Gary Miller in 1976. He provided the deterministic test based on the assumption that Riemann hypothesis was true. Michael Rabin, who won the Turing award in 1976, probably did some of his more important work after winning the Turing award, which is quite unusual, proposed this in 1980, and that's the one that we'll talk about.
Here is how this works, and I'm not going to cover the mathematics behind it, but just enough to be able to implement the test. We're going to start with our guess n, which must be odd. Obviously, if it's even it's not prime. Because it's odd, that means we can write it as a multiple of 2 + 1. That means we can break it into 2^t^ * s + 1. Next we want to choose some random a value in this range from 1 to n -1.
If n is prime, we know that either a^s^ is equal to 1 mod n or we know that a^(s(2j))^ is equal to n - 1 mod n for some j.
The j values are in the range from 0 to t -1, which is the power here. That's the big advantage that we only need to try a small number of values. If we don't find any value that satisfies this, we know it's composite. The important property that this test has that's different from the Fermat test is that the probability that a composite number of passes is always less than some constant. In this case it is less than one quarter. There are no bad composite numbers like the Carmichael numbers. If we choose a randomly, for any composite number the probability that the test is less than one-quarter.
Now for a quiz. If we want the probability that our guess n to be composite is less than 2^(-128)^, how many times should we run the test each time picking different, randomly selected a value?
The answer is 64, so each trial the probability is 1/4. We need 64 trials--each as independent-- to have the probability reach 2^(-128)^.
Let's restate our protocol. Alice starts by generating a large prime-- we've seen how she can use that using the Rabin-Miller primality test-- finding a primitive root--we haven't seen how to compute that, but there are efficient ways to do it-- and selecting a random secret in the range 1 to the value of her prime-- -1. Then she computes the value of raising g to that power mod q, sends that to Bob, and I'm showing in this version of the program I'm sending bot the prime and the primitive root. Those could've been agreed on earlier in public. Nothing secret about them.
Then Bob computes his value, selecting his random secret and computing yb. At the end they can both compute the same key, raising their respective values to the appropriate powers. So we've assumed so far an eavesdropper--a passive attacker-- who can hear all the messages on this channel but can't disrupt the channel. What happens if we have an active attacker?
An active attacker can change messages on this channel. instead of just intercepting them and listening to them--eavesdropping on the message-- they can intercept and change the messages. Is this protocol secure against an active attacker who can modify messages in transit?
The answer is most definitely not.
This protocol depends on the integrity of the messages received. One easy way to see how it can fail is if an active attacker can change the value of ya to 1 and change the value of yb to 1, then Alice and Bob will still agree on a key but that key would be 1 raised to their secret power, which is still 1. It would be the key value 1, which would be known to the eaves dropper and make all messages encrypted using that key vulnerable. The attacker could also intercept that protocol here and separately execute the protocol with each party. This would make Alice think that she has a secure key with Bob, but it's actually a secure key shared with an attacker in the middle. It would make Bob that that he has a secure key with Alice, but it's actually a key shared with an attacker in the middle. That means the attacker in the middle could take a message that Alice encrypts with this key that's actually shared with the attacker in the middle, can decrypt that message, can then re-encrypt that message using the key that the attacker shared with Bob and send the result to Bob. Bob will decrypt it, thinking it's a good message from Alice. This is a very dangerous attack.
Diffie-Hellman can only be used in places where either the integrity of the channel is guaranteed or there is a way for Bob to find out Alice's ya value and know that it's really Alice. It could also be useful in the case where there's some trusted directory that has the y values. Bob could look up Alice's ya value, know the q and the g values, and then Bob would know that he's communicating with Alice if he trusts this directory.
One way to provide that is using a certificate authority, which we'll talk about in unit 5.
This brings us to the end of unit 3. As long as we have a channel with integrity, we have a solution to our key distribution problem. We can use the Diffie-Hellman protocol where at the end of the protocol we've established a shared secret.
Once we're established that shared key we can use it to send messages, using our symmetric encryption operations from the previous two units. We still haven't solved many of the important problems that Diffie and Hellman identified in this 1976 paper. The one that I highlighted at the beginning was digital signatures. We can't do digital signatures this way because the key is shared between Alice and Bob. That couldn't be used to sign something since either Alice or Bob would be able to make that signature. To have digital signatures, what we need is asymmetric cryptography.
That's the main topic for Unit 4. Hope to see you there.