cs313 ยป

So how can we take an algorithm and put that into a Boolean formula? Well, that's where Cook and Levin had another ingenious idea, because they said you can look at an algorithm running on a RAM as a series of snapshots, and what I mean by this is the following: So assume you have an algorithm on a non-deterministic RAM that runs in polynomial time, so you have a point in time T equals 0, that's when your algorithm starts out, then you have T equals 1, T equals 2, so those are the time steps here. And the final time step is going to be some polynomial of N. That is clear because we're looking at an algorithm that solves a problem that lies in NP, So that means it only takes polynomial time on a non-deterministic RAM at least. Now if you look at what happens on the RAM in each time step, I can basically draw you the following picture. If you recall from unit one what the RAM looks like, the RAM has only a few components. The RAM has a read-only memory, the RAM has a program or the algorithm running, so this algorithm here is basically the program running, and just as I would write the algorithm line by line by line, I can also write it in this way, so this here would be the first line of code, then this would be the second line of code, and so on. And finally, the RAM had a reading and writing memory, so we had some memory cells here holding the variables, and those variables, of course, are changed by the program depending on what's here in the input. And now comes the neat part that Cook and Levin observed, because what they observed is that when you look at an algorithm working on the RAM, then you can depict that as a number of these snapshots, so if you say, add T equals 0, these are the contents of the read-only memory. This is, actually we need other information, we also needs to know where the program is at, but you can basically say, this is the input here, this is the program, and this is the line of the program that we're executing right now, and this here is the contents of the read/write memory. In the beginning this will be empty, but as the algorithm works, it will also put some content in here, and then, of course, the program moves on. It can also jump back and forth here, and of course, we will have more and more content in the memory, and at a certain point in time the program will say, I'm done, and it will hopefully have a certain output here. But since we're working with decision problems, actually it's only interesting to us if the program says, yes or no at the end. So for decision problems we don't even really care about what's in here, we would care about that if we were solving the optimization problem or want additional information, but actually for a decision problem, it would just be important for us to know at which line of code the algorithm finishes. If it finishes at a return statement that will return yes to us or a return statement that will return no to us. Now let's have a closer look at those snapshots, and we'll actually do that as a quiz. So this here is one snapshot. I would like you to tell me what we can say about a single snapshot and also about all of these snapshots here. So there are 3 statements, each of which I would like you to check this box if you think they are true and otherwise leave it blank. So the first claim you could make is that each snapshot has size polynomial in N, and N is the size of the input as always. Secondly, you could claim that there can be an exponential number of snapshots if we look at all of the time steps. And finally, one claim that I would like you to check out as well is all snapshots, if taken together, so this whole part here, have polynomial size, and by polynomial size, I again mean that it's some polynomial of the input size that we're always using. You should keep in mind that the algorithm that we're looking at is an algorithm for a problem in NP, and it runs on a non-deterministic RAM.

So there's 2 true answers here, and I know this was a rather challenging question, but just imagine how challenging it must have been for Cook and Levin when they figured this out for the first time. So first of all, each snapshot has size polynomial and N. That is true, so why is that? Well, first of all, we said the size of the input is a polynomial of N or often times it is N, but sometimes it can also be a polynomial. So for example, when we are given a graph with N vertices, then there can be up to N squared edges, but in any case, the input is some polynomial of N. Now the size of the algorithm or program here, that is constant because the algorithm doesn't change with the input, so we can assume that this year has constant size no matter what kind of input we are given. And then finally, this is an interesting one, how much memory does our algorithm need? Well, it's an algorithm for a problem in NP, which means it takes only a polynomial number of time steps, and in each time step, it can only modify a constant amount of variables. So the total memory that it needs is some constant times the number of time steps, and since the number of time steps is a polynomial of N, the total space required here for the memory is also a polynomial of N. So you have a polynomial of N plus some constant plus a polynomial of N, so each single snapshot has size polynomial in N. Now the second one, I think that was the easiest one to find out, is of course not true because we said we're looking at a problem in NP, we should have even written it down here on the left side. So we said the number of time steps must be some polynomial, because otherwise, the problem wouldn't be an NP, and since we're running on a non-deterministic RAM, we can assume that the number of time steps here is bounded by some polynomial. So since there's only a polynomial number of time steps, there can also only be a polynomial number of snapshots, so there cannot be an exponential number of snapshots. But this, if you take it together, is-- well, I don't know it's pretty cool, but at least it's very useful for the proof that we're trying to do, because if each snapshot has a size that's polynomial in N and the total number of snapshots that we have is a polynomial, then you have a situation where you have a polynomial number of polynomial size snapshots, which means that the size of it all taken together is a polynomial times a polynomial, which again is a polynomial. So basically, a complete protocol of what this algorithm here is doing given this input and using this memory here, the complete protocol only takes up a polynomial amount of space. Polynomial and N, the size of the input. What I will now show you is how you can encode a single snapshot. So for example, this one here or this one here as a Boolean formula.

To show you how we can encode a single single snap shot on the REM as a Boolean formula, I'm going to show you a little trick with the Boolean formula. So this Boolean formula-- it looks a bit intimidating, and it doesn't look fun. Well I'll just quickly explain it to you and then help you figure out the special property that it has. So the formula consists of 6 variables: X1, X2, X3 to X6 and it has 7 brackets, so as you can see, each of those brackets contains all of the 6 variables: X1, X2, X3, X4, X5, X6. Sometimes they're a bit mixed up, but every bracket here contains the 6 variables exactly once. They're all connected with an and here, and each one of them looks like this, so up here you have an or over all of those variables, and down here, each bracket has a very similar structure, so you have X1 or you have one of the variables and a not of that variable, or, and this is a very big not over off the other variables, and those are combined by or. Now probably it's a bit tough to figure out by yourself what the purpose of this formula is. I want you to figure it out, but I will give you a few choices. So what's special about this Boolean formula? Does it not have any satisfying assignment? Does it have several satisfying assignments? Can it only be satisfied if exactly one of those 6 variables is set to true, and finally, if you generalize this, so not 6 variables but any number n of variables, and then you do the same structure as you do here. That's the whole resulting formula, have size 0 (n squared) for n variables? So I think by now you're used to more than one of these answers can be true, so check all of these that are correct.

And the answer here is not actually I'll explain them in a minute, and this one here is not correct. So the formula indeed does have a satisfying assignment. Not only one; it actually has several satisfying assignments, but each of those assignments has the property that exactly one variable is set to true, and I'll show you in a second how that works. And of course--I mean, the last thing was probably quite easy to figure out; as you see this already has kind of a square shape, so if we continue like this and add another variable here, then we would have to add another line down here, and each of those formulas grows by 1 and the whole thing grows by one of those brackets here for each variable that we add, so the size is O(n squared) for n variables. So let me explain this here to you. So there are several satisfying assignments, but each of those satisfying assignments has the property that exactly 1 variable is true. So for example, let's say that we set X1 to true, then what will happen is the following, so X1 is true; this means this one down here goes to 0. This one goes to 1, this one goes to 1, this one goes to 1, this one goes to 1, and this one is 1. So let's have a look here at this part of the formula. All of the variables here are connected by an or, so we have X1 or X3 or X4 or X5 or X6. This means if just one of the variables here is set to true, which it now is, then the whole bracket here, so from here to here will also evaluate to true, and then we take this very, very big not here, which means this will evaluate to 0. Now this very big not here evaluates to 0, then we have to make sure that this variable here evaluates to 1, because all of them are connected by an and, and we are looking for a satisfying assignment, so to do that, we have to set X2 to 0 so that this here evaluates to 1. Now for the next bracket, it works the same way, because again, we have X1 set to true, so the bracket evaluates to true, so the not evaluates to 0, and so again we have to set this to 1 which means X3 is 1 and we can go on through this the same way. So the formula forces us to set every variable except for X1 to false, and now the way that this formula is structured, of course, we could also have chosen to set X2 to true, but then if X2 is set to true, we have this here, so X2 here is 1, 1, 1, 1, which means in order to get these here to go to 1, so we need to have these here go to 1, because this here evaluates to 0, 0, 0, 0, 0, so we have to set all of these variables here to false. Yeah, finally, what about this one here? Well, since X2 is set to true, not X2 is set to false, but luckily, because we've set all of the variables to false, so we have 0, 0, 0, 0, 0. In this case, the large not here evaluates to true, so this bracket here also evaluates to true, and we have satisfied the Boolean formula.

What this means is not for any given number of variables, we can write a Boolean formula that can be satisfied if and only if exactly what one of these variables is set to true, and we figure out that we can do this for any number of variables that we want. So now let's go back to snapshots and see what the Boolean formula that I just showed you has to do with these snapshots. Each snapshot, as we've said, basically consists of which is basically the input. We have program, and we have the read-write memory for immediate results and output. So now let's think back a moment of how we defined the RAM. When we defined the RAM in the last unit, what we said was the following. We said that whenever we're talking about memory, so this part here or this part here, then every single cell in the memory can not hold arbitrarily large variables, which means, for example, that this cell here could either be 0; it could be 1, it could be 2, and so on, but there's a certain limit of how large those values here can be, just as with your novel computer programs, too. They can carry very, very large numbers but there's a limit and that limit is some constant, so I'll just write c here, so if you have an 8-bit computer, that constant will be smaller, of course, than if you have a 16-bit computer and so on, but in any case, it will be some constant determined by the machine, and the same thing is going to be true all along the memory. So for each single memory cell that you have here, you have a constant number of possibilities for the values that this memory cell can take, so the size here is constant, and here, it's polynomial in n, as we figured out before. Now what about the program? So for the program, we said that, of course the size of the program of the algorithm does not change with the size of the input, so if you look at the program and your write it as we said, we've written the code from left to right, but normally, you'll write it from top to bottom, so if you took this code and wrote it like this, then you would have a constant number of lines in your program and that would be a certain position on where you are, so again here, there's a constant number of potential positions where you can be in the code, and of course, you just have one code, so there's no polynomial size here. It's just a constant number of code lines. And finally, over here fr the input; of course this is read only, but what I'm interested in is again the possible number of states that you can have in each of these memory cells, and that again is the same constant as over here, and here we have again a polynomial in n, so what does this have to do with the Boolean formula that I just showed you? Well, what we could do is turn this into a Boolean formula with a huge number of variables, but you will see that the number of variables is still polynomial. So what are my variables going to be? I'm going to have one variable for each of those black boxes here and also one variable for each of these here, and so on, and so on, and so on. I'm going to have one variable for all of the possibilities where I could be in my program code, and again here I'm going to do the same thing as I did here on the left side. I will have one Boolean variable for each of those possibilities, and what one of those Boolean variables means is basically--or what I want it to mean is if it's set to true, it tells me that a memory cell contains that value, so if this variable here is set to true, I want that memory cell up here to contain the value 0. If it's set to 1, I want this one here to contain the value 1. If it's set to 2, I want this one here to have the value 2, and so on. And now you see why the Boolean formula that I just showed you where you can force just exactly one single variable to be true is useful in this case, because if I put all of these variables here into such a Boolean formula, I would have a Boolean formula that can be satisfied if and only if exactly one of those variables here is true, so it tells me uniquely what goes into this memory cell here as long as it's satisfied, and the same thing over here, and over here, and over here. And now if I write this Boolean formula for this memory cell here, I write it for this memory cell here, here of course also, and so on, and if I combine all of those Boolean formulas, so I have a Boolean formula here, I have a Boolean formula here, I have one here, from this column, from this column, from this column, and so on. So I have Boolean formula, and then I do an and, and I have the next Boolean formula, and I have an and, and I have the next Boolean formula, and I continue doing this, and I will also do this here for the program. And of course, I will get a very, very, very large Boolean formula.

And this is, I promise, going to be the most complicated part of this unit. Once you've understood this, you have understood how to prove the Cook-Levin theorums, so bear with me for another minute here. If we write this huge Boolean formula what I would like to ask you now is what properties this Boolean formula has. So if I write a big Boolean formula like this here, so the train that's exactly one variable here is set to true one here, one here, and so on and so on. What I would like you to think about is some of the properties that this Boolean formula will have. So there are 3 properties of 3 choices that I'm giving you. One is that the size of this Boolean formula here is polynomial in in. The second one is that this Boolean formula here has only a single satisfying assignment. And the third one is that every satisfying assignment of this Boolean formula presents a potential snapshot, so it kind of uniquely defines what the memory of the RAM looks like and where the program is at. Now of course, there's only 3 choices here; again, more than 1 can be true and you could make your way through by guessing, but before that, I'd encourage you to think through these statements, because once you understand this part, the rest of the Cook-Levin theory is actually quite easy.

And there's 2 correct answers here. The first one is that the size of this huge Boolean formula here is polynomial in n, and I will show you why. First of all, how large is each one of these formulas that we're building? Well, there's a constant number of variables and as I showed you before, if you want to ensure that for a constant number of variables, only exactly one of them can be set to true, then the resulting formula is about the square of the number of variables that you have and since we have a constant number of variables, each formula here is about the size of the square of that constant, so we have-- it's kind of O of c squared, and c does not depend on the size of the inputs, so in a way, we can even write this as O of 1. Now we have a polynomial number of these Boolean formulas, so we have a polynomial n times the constant; this is again a polynomial of n, so we're fine on this side here. We're also fine on this side here, because the formulas again are of constant size; it can, of course, be a huge size, especially if you have a system that can carry very large variables, but nevertheless, it's a constant size, and that's all we care about; it does not depend on the size of the input, and we have a polynomial number of them again here. And finally, the code also has a constant number of lines, and so the Boolean formula resulting from that will also have constant size, so overall, this huge formula here has a size that is polynomial in n. Does this formula have only one satisfying assignment? No, it doesn't; it has a huge number of satisfying assignments, but each of those satisfying assginments will ensure that exactly one of the variables here is set to true and so on, and this gives us the third property, which means that this satisfying assignment defines a snapshot, because it will tell you exactly what's in this memory cell here and so on. It will tell you exactly where the program is at and it will also tell you exactly what's here in this memory. And once we have this, you have actually completed the most difficult step of this unit in my mind, and we are now ready to show that sat is np complete.

So now our Boolean formula that we are building as it inputs to SAT is going to become even larger. So as you have seen before, the calculation that a non deterministic RAM makes for a given input can be represented as a polynomial number of snapshots, and I'm now going to draw the snapshots only like this; you know that represents the input, the program, the read-write memory, as well. We've already seen that the state of the non deterministic RAM at each point in time can be represented as a snapshot and that snapshot can be represented as the Boolean formula, so now that we found out how we can encode a snapshot into a Boolean formula, let's go back to our main mission, showing that any problem that can be solved on a non deterministic RAM can be encoded as a SAT formula. And the idea of this is to build a huge-- well, not that huge; it's all going to be polynomial size, but it's still going to be very large, so the idea is to build a very large Boolean formula as follows. each timestep of the algorithm is going to be represented as a snapshot. Then we're going to ensure that the first of these snapshots represents the memory of the RAM at t =0. What that means is that the program starts at the first line of execution and of course that memory contains the input that we're giving to that program or algorithm. Then thirdly, we have to ensure that the snapshots fit together. What I mean by that is even though it's a non deterministic RAM if at a certain point in time, say t = x, it is in a certain state, then at the next timestep, t at x plus 1, there's only a limited number of choices or potential states that machine can be in. So if we were having a deterministic RAM, actually as we already said, the state of the machine added time point x already clearly determines what state the machine will be in at point x plus 1. For a non deterministic RAM, if you use the if better operation then there can be 2 different states at the next time point. And then finally, we have to ensure that the Boolean formula that we're building has an input to SAT can only be satisfied if the algorithm returns yes at some point in time. Otherwise, because we have a decision problem, if the algorithm returns no, we do not want the Boolean formula to be satisfiable.

We already know how to do the first part of this. We just have to write a Boolean formula for each single timestep. The second part is also not that difficult. If we are given an algorithm and an input of that algorithm, we already know what the first snapshot of the RAM is going to look like. The program will start at the first line of the program code, and the memory contains nothing but the input itself, so given the way that the Boolean formula is written, all we have to ensure that we enforce the variables for the first snapshots to represent that state that we already know, but we can ensure that, so for example, if in a Boolean formula, you want a variable x1 to be set to true, you would write it like this; you would have certain statements of the Boolean formula, and then you would just put the variable x1 here, and then continue the formula, and then you know it has to be set to true, so in this way, you can enforce the Boolean formula that represents the first snapshot to present the memory of the RAM exactly at time point t equals 0. The fourth one here that the Boolean formula can only be satisfied if the algorithm returns yes; that is also not that difficult to check, because again using a technique like this here all we have to do is ensure that at a certain point in time, the algorithm here is at the program line or the RAM executes the line of the algorithm here or one line of the algorithm here that says we return yes so that algorithm gives us the yes value. So the only part that requires a little more thought is the third part here, how we can ensure that various snapshots fit together.

So let's take a quick look at where we are so far. What I first introduced to you is the concept of snapshots of the RAM, so you can look into the machine at any point in time that a polynomial time algorithm executes, and then you see exactly what state the RAM is at that certain point in time, so which line of the program it's executing and also what's in the memory. I've now also shown you that we can represent a snapshot as a Boolean formula. What that means is that I've shown you how to construct a Boolean formula so that when you have a satisfying assignment for this Boolean formula and there can be many. Then you can reconstruct this snapshot from the satisfying assignment. What we also know is the assignment of the variables for the first Boolean formula, so at time point zero, and the reason why we know that is that at the beginning, we know what the machine is doing, because in the memory there's the input, the algorithm starts at the first line, and in the memory where we have the output, and intermediate results, there's nothing in there, so here, we already know what the variables are going to be. Now why are we doing all of this? In the end, what we want to show is if we solve SAT for this huge Boolean formula that we're building and we're not done yet, if we solve SAT for this formula here, then we want to know what the machine is actually doing or in other words, what we want to get is a protocol of what the algorithm here has done. So each of the Boolean formulas here represents a snapshot, but now we need to ensure that they fit together, because once the machine starts out in this snapshot, it will move to the next one, and so we must make sure that if we have an assignment for the variables here that represent a snapshot, then the snapshot that is represented here must fit together with one, so that it's kind of a valid representation of what the RAM is doing here. And the way we're going to achieve this is, of course, more Boolean formulas so that we add to the game, and to do that, I would like you to think about some statements for a bit regarding how snapshots connected to each other. So what I would like you to tell me is which of the following statements is true? The first one is one a deterministic RAM, if we know what state that machine is in a time point to, we can clearly state what the state will be at time t plus 1. The second one is I want you to tell me if this here were true if instead of a deterministic RAM, we were talking about a non deterministic RAM. The third one is a non deterministic RAM behaves exactly like a deterministic RAM except when we use the if<u>better function,</u> and finally, if we use the if<u>better function,</u> there can be any number of next states, so not 1, not 2, not 3, but basically an arbitrarily large number of states, and we can not make any statement about that. So which of the following is true? Please check all of the boxes where the statements are true.

So the first statement is clearly true, and we talked about this a couple of times in this unit. If you know the state of the deterministic RAM, that means you know whats in the memory and where the program is currently executing, then you can easily tell what the machine will do as a next step. The second one is of course not true, because we have the if-better function, and so if we know the state of a non deterministic RAM at time point t, and at that point in time, we call the if-better function, then we can not say what the state will be at time t plus 1. There are basically either the first part of the code is executed or the second part of the code is executed. And this gives us the answer for the last 3 statements here, so this one here is false, because a non deterministic RAM does not behave like a deterministic RAM, but there is actually only one case where it does this, and that is when if-better is called, so this statement here is true, and when the if-better is called, there can not be any number of next states. There can only be 2, because you'll remember, the if-better function works as follows. You call if-better, then there's some code here, and then you have an else statement, and there's some code here. Then there's only 2 possibilities, so it's not any number of states; it is actually just 2, so this is also false.

Now we are almost there. What I would like you to do now is think about when you analyze an algorithm using pen and paper, how would you go about that? What you basically say is every program here and your variables here, so how do you get from here to here? So basically, it's a number of rules that look like this. If we are at a certain line of the code, so say we are here, and usually that line of code will also use some variables, but it doesn't have to, but usually it will of course, so certain variables are set to a given value. Then we know what the program is going to do next. For example, it's going to jump to the next line of code, and it's going to modify this variable here. So how many of these rules are there? Well, for a single line of code, it depends on the variables here, and let's assume that one line of code uses a maximum of 3 variables. If you were to count how many of the rules there are, it's the number of different values that one variable can take times the number of different values that the second variable can take times the number of values that the third variable can take. So for one line of code, that's actually just O of 1 as the constant. Why is that? Well, because we set that on the RAM variables can not get arbitrarily large, so there's only a constant number of different values that a variable can take, so even if we have 3 different variables and consider all of the combinations, it will be a huge number usually, but it will be a constant number for one line of code. Now if we're not looking at one line of code but all lines of code, then this actually doesn't change, because we have a constant number of possibilities for each line in the code and the program has constant size just as before. So there's a constant number of rules that will tell us exactly what the machine is going to do in the deterministic case. We're going to go to the non deterministic case in a minute, because there of course, as we just found out, the if-then rules don't work.

So how the machine behaves is a large collection of rules that look like if the variables are set in a certain way, then this is what happens next. Now the question is can we use a Boolean formula to express an if-then statement? And I will actually let you figure that out in our next quiz. So I'm going to give you the following Boolean formula, x1 and x2 and x3. We're going to do a big not over this or x4 and x5 and x6. And I'm going to give you 4 cases now to look at, so we're going to look at this part here and this part here, and I'm not including this big not here, so I'm just looking at how x1, x2, and x3 evaluates; the not is not included. So there are 2 possibilities of course. It can be either true, or it can be false, and those variables over here, completely independent over here, they can of course also be true and false, so overall, we have 4 different cases, and what I would like you to quickly tell me is in which of these 4 cases here, the whole Boolean formula is satisfied.

So the Boolean formula here is almost always satisfied. It's satisfied in this case over here. It's satisfied in this case over here and this one. The only one where it's not satisfied is this one here, and of course that's easy to evaluate, so if x1 and x2 and x3 evaluates to true, then we have the big not, which will make it go to 0 and 0 or 1 is equal to 1, so the Boolean formula is satisfied, but 0 or 0 evaluates to 0, so here the Boolean formula is not satisfied. For these 2 cases, the Boolean formula is simply satisfied, because if this goes to 0, then we have the big not over here, so it will go to 1 and 1 over 1, and 1 or 0 that also evaluates to 1. What is this have to do with an if-then? Well, it's actually quite simple. If this part here, x1 and x2 and x3 evaluates to 1, then this part over here must also be 1 in order to satisfied the Boolean formula. It does not work if this part over here is 0. In the other case where x1 and x2 and x3 evaluates to 0, then as you can see, we basically don't care what the other variables are doing, because the Boolean formula is already satisfied, and that is exactly how an if-then behaves. If the condition is satisfied, then we want something specific to happen. If the conditions are not satisfied, we don't really care what happens, or at least we're not really going to force anything to happen.

So finally, what about non determinism? What if we're using the if-better function at a certain line in the code? Well, then the rules here are actually much simpler than for a normal line of code, because any time we use the if-better, our rule is going to be that the program either goes to 1 line of code or another one, because that's all the if-better does. The if-better does not modify any variables, it doesn't depend on any variables. It just goes to either one line of code or the other one. So this is a very simple rule that we can use to say what will happen if the program at time point t point to an if-better.

So now it's time for us to put all of our building blocks together and we're just this close to showing that SAT is indeed NP complete. So we started out with a problem in NP and input fallout problem; of course, this here can be any problem in NP and this here can be any input fallout problem and what do we then do? We said if the problem is in NP, then there must be some algorithm for that problem which runs in polynomial time on a non deterministic RAM, so an algorithm or a program you can call it any way you want, so that will run in polynomial time, and the algorithm at some lines of the code either says yes or says no, but it's running in polynomial time in any case, and of course the input for the problem, well that is basically just the number of variables in the memory of the machine as it starts out. Then we took all this and of course it's a huge Boolean formula and you would probably never really want to write it out explicitly, but it exists and it contains a number of components that we have discussed. So one part of the Boolean formula encodes snapshots of the algorithm as it runs on the machine. Then we said we have to have 1 part that ensures that the first snapshot properly represents the starting conditions so that it properly the algorithm starting at line one and also the memory representing the input for the problem. Then what we just discussed is that through a number of rules that look like if-then, you can ensure that the snapshot is at a certain point in time t and at the following time point fit together, and then finally, since this here is a decision problem, we want to ensure that the Boolean formula can only be satisfied if at certain point in time the algorithm returns yes, so this is the last part we have to add to the Boolean formula, and this is actually quite easy to ensure, because once the algorithm reaches this line of the code, wherever it may be, and you can even have multiple lines where the algorithm returns yes, but once such line has been reached, the algorithm returns yes and stops, so this her is very easy to ensure. You just have to make sure that there's one snapshot where the algorithm is in a line that returns a yes. So once we have constructed this Boolean formula, what happens if we solve SAT for this formula? Then there's 2 cases that can happen, because SAT is also a decision problem, so either SAT returns yes or it returns no. What if it returns yes? If it returns yes, that means that there is a satisfying assignment for this Boolean formula, and a satisfying assignment will have the following property. First of all, it will encode valid snapshots of where the algorithm is and whats in the memory. Secondly, it will also ensure that the machine starts out in the right place, meaning it starts out at line 1 of the code and representing the input for the problem, it will ensure that all the snapshots that it has figured out will fit together and it can only return yes if the algorithm returns yes at some point in time. So if we have a yes here, we also know that this problem or the decision problem here is one where a non deterministic RAM would also answer yes. If it says no, on the other hand, what does that mean? Well, we know that we always can encode the snapshots, so it's not going to say no, because it can not encode a snapshot, because that's possible always. It also won't say no due to this property here, because well, we have ensured that the snapshot can properly encoded, so there's no mistake in the formula here, so it can not say no because of this here. It also can not say no because of this here, because the machine will always run and you can make the snapshots fit together. The only reason why there can be no satisfying assignment is if the algorithm does not return yes at a certain point in time and since this algorithm solve this problem here, if it can not reach yes, then this means that this decision problem here is also a no. So satisfiability only if this problem given this input here is a yes. So now that we have put it together, here's our final quiz for proving that SAT is NP complete, and what I would like you to do is to recap the properties of this Boolean formula here. First of all, what is the size of this Boolean formula for an input of size n and remember this algorithm here runs in polynomial time. So is this Boolean formula up here constant in size, polynomial in size, or exponential in size with respect to n? and then can this Boolean formula encode any algorithm at least for a problem that is an NP? Can it encode any input of size n that were given to that problem? Finally is this Boolean formula satisfiable if the input here is a yes to not decision problem here or are there any other cases where we could also have satisfiabilty. So please check all of these that are true.

And of course, once you've gone through the proof, this was a rather easy quiz just for relaxation after all of these Boolean formulas and satisfiabilities. So of course, this formula has polynomial size, because when you constructed it, we took care of that. It can encode any algorithm. We did not make any special assumptions or restrictions on this algorithm here. It can also encode any input, because that basically just means whatever we're given, we encode it into memory and we know we can do that, and finally, of course, that was the purpose of the construction, this Boolean formula up here is only satisfiable if solving this problem for this input returns a yes; otherwise, if it's a no, you can not satisfy this formula. And this of course is the whole purpose of the construction, because what we have now shown is that solving SAT on this Boolean formula is the same as solving this problem for this input here, and what that means, of course also is if you can solve SAT on this Boolean formula here in polynomial time, then you have a polynomial time simulation of this algorithm here running on this input.

So congratulations! We have now shown that any problem that is in NP of size n can be transformed into a SAT problem with size polynomial in n. And what that means is that SAT is NP complete. Now I know that it was a lot of work to prove this and you can be proud of yourself for staying along, and I actually have to thank Steven Cook and Leonid Levin, because thanks to their work, I already know how to do the proof. Just imagine what they must have gone through before they come up with this.