And this concept here of taking an input and transforming it in polynomial time into a new input for another problem so that you get the same solution is known as a polynomial time reduction. And if you want to be precise about what you're applying this reduction to, you can say that you have a polynomial time reduction from X to Y, so in other words, you can take a problem X and show that problem Y is powerful enough to solve problem X. So in order to help you wrap your head a little bit more around polynomial time reductions, we'll do a little quiz here. So I would like you to think about if we manage to reduce a problem X to another problem Y, what does that imply? And I'm going to give you 4 choices here, and I know it's easy to guess which one of those are correct, but you should really give this some thought first to make sure that you understand the concept of reductions because this is going to play a very important role in this unit and also in the next units. So I'll give you 4 choices here. Does reducing a problem X to a problem Y imply that: X is at least as hard to solve as Y, and by hardness, I mean the distinction between polynomial time and exponential time. So in this case here for example, if Y is only solvable in exponential time, does that mean that X can only be solved in exponential time as well or is it other way around? Does it imply that Y is at least as hard to solve as X? Does it mean that if X can be solved for the polynomial time, then so can Y or does it mean that if Y can be solved for the polynimial time, then so can X? So please check all of the answers here that are correct.
And there are two correct answers here and that is the second answer and the fourth answer. And the reason for that is as follows. If you managed to have a polynomial time reduction from a problem X to a problem Y, that means that you can take any input to the problem X and then use Y to solve it. What it does not mean is that you can take very input for Y and use X to solve it. The reduction is only one way. But that means if you managed to reduce the problem X to a problem Y, then solving Y must require at least as much effort or power as solving X. So that means that Y is at least as hard to solve as X and not the other way around as I just said. Now on the other hand, since Y is at least as hard to solve as X that means if you find a way to solve Y in polynomial time then of course you can also solve X in polynomial time because this reduction here takes polynomial time and then solving that problem would also take polynomial time. On the other hand, if you can solve X in polynomial time, so this problem here, that doesn't really mean anything for Y because as we just said Y is at least as hard to solve as X but it doesn't have the same hardness or is easier in any way. This implication here does not hold true and those two are the only ones that are true. Now if this quiz confused you a little bit, I find a good way to memorize this what the reduction from the X to Y means is that if you have a reduction from X to Y that basically means that this problem X here must be able to fit into the problem Y and that is why this problem Y here is in a way larger or harder to solve than X so you take this problem X and put it into Y that is what the polynomial time reduction can be pictured as and at least from me that's very helpful. And of course I do hope that it is for you as well.
Now that you know about polynomial time reductions, imagine that we had one problem in NP such that every other problem that is also in NP could be reduced to it. We would have vertex cover which could be reduced to that problem, we would have independent set which could be reduced to that problem, and also clique, of course. And any other problem in NP as well. No matter how complicated that problem as long as it's NP that means a non-deterministic RAM you can solve it in polynomial time-- there exist a polynomial time reduction to this ultimate problem. And such a problem actually has a name, so if you could find a problem in NP such that any other problem in NP is just as hard to solve or easier to solve, then that problem is called NP complete because in a way it represents all of the problems in NP because if it's a problem that any other problem can be reduced to, an NP complete problem is basically capable of solving or representing all other problems that are contained in NP. An NP complete problem is contained in NP and the important part here is that really any other problem in NP can be reduced to it. Not just vertex cover, not just independent set, not just clique but any problem that lies in NP can be reduced to that problem. Let's have a little quiz to familiarize yourself with the concept of NP completeness. I have three statements here about NP complete problems and I would like you to tell me which one of these is true. There can be more than one.
The first statement, well you might be tempted to think that because I just called it the ultimate problem in NP, but actually when you look at the definition of an NP complete problem-- it just means that any other problem in NP can be reduced to it. There could be multiple NP complete problems as long as they have the same hardness. This here is not true. If you found a polynomial time algorithm for an NP complete problem, then what that would mean is that there would also be polynomial time algorithms for all other problems in NP, and the reason for that is that for an NP complete problem, we have said that any other problem in NP can be reduced to it and that can be done in polynomial time. If you can solve this problem in polynomial time as well, then you automatically have a polynomial time algorithm for any problem that you can find in NP. And finally, if vertex cover would be an NP complete problem, then clique would be an NP complete problem as well because we have shown that you can use polynomial time reductions to get from vertex cover to clique and vice versa. From a perspective of polynomial time versus exponential time hardness, these two problems are basically equally hard to solve. This one here is also true.
So let's say now that you have a problem that you know to be an NP such as vertex cover, and there's basically two ways that you could show this problem to be NP-complete. The first one is, you could take another NP-complete problem and reduce it to your new problem. A second way would be to use the definition that we just had of NP-completeness and show that this problem here can be used to solve any other problem that is also solvable in NP, and by showing, I mean proving. That is the first way, and the other way would be to just use the definition of NP-completeness that we just looked at, so what you could also do is you could show, or in this way you would have to prove mathematically, that any problem in NP can be reduced to your problem. And the emphasis here is on any problem, because otherwise, it wouldn't be NP-complete. And those are the two possible ways you could show NP-completeness. I will call this here approach number one and this one up here approach number two. Now if you look at those two approaches, this one here sounds much easier in a way than this one here, because here we just have one problem where we have to have a reduction, and here we have to have any problem. Not even some problems, not even a million problems, but really any problem. But there's a little problem with the first approach here, and I hope you can tell me what that is. So what could be potential issues with approach number one? Could it be that this only works for some problems in NP? Could it be that before we can do approach number one, we must first have had success with approach number two for at least one problem in NP? Or could it be that approach number one is not always correct? So please check all of these statements which you believe are true.
And the answer here is that, well, this approach, number one, it in principle works for any problem in NP, so that is not the issue with this approach, and this approach is, of course, is also always correct because if you can show that you can reduce an NP-complete problem to another problem, that problem here becomes NP-complete as well. The issue is that it's a bit of a chick in an egg problem, because in order to be able to use this approach, you must first have an NP-complete problem, and unless you have one NP-complete problem, this approach here will fail, so you need a kind of seed problem that you show to be NP-complete using this approach here before you can use the more convenient approach down here. So unfortunately, while we would always like to work with approach number one, we can only do so if we have had success with approach number two at least once.
So the big question is here, how on earth are we going to do this? Because this seems extremely difficult to show, so we would have to find a problem for which we can prove that any other problem in NP can be reduced to it. Luckily for us or actually more luckily for me, this work has already been done about 40 years ago by showing a problem called Boolean satisfiability, or SAT for short, to be NP-complete, and this result is one of the most famous results in theoretical computer science, which is why we're going to investigate it in detail and also investigate the proof together. The result is called the Cook-Levin theorem. This theorem is named after Stephen Cook and Leonid Levin who discovered it independently of each other in the 1970s, and by showing that, they provided exactly this NP-complete seed problem, namely SAT, which could then be used to show that a number of other problems are NP-complete as well. And I'm then going to show you how Cook and Levin proved that SAT is NP-complete, and once we have shown that SAT is NP-complete, we can go back to the problems of Alice, Bob, and Carol and see if those problems are NP-complete. Before we go deeper into the SAT problem and proving the Cook-Levin theorem, here's one more quiz to make sure that you understand where we are going with this. So once we have shown that the SAT problem is NP-complete, how would we then show, or at least try to show, that the vertex cover problem, independent set, and clique are NP-complete? Would we have to show that any input for SAT, and I'll soon tell you what that input exactly is, can be transformed into an input for one of these problems, and by these problems, I mean vertex cover, independent set, or clique, or would we have to show that one of the three problems up here can be expressed as an input to SAT? Or would we have to show that all three problems can be expressed as an input to SAT? So please check all of these which are correct.
And this time we only have one correct answer, which is this one up here. So once we know that SAT is NP-complete and we wanted to show that those three problems, vertex cover, independent SET, and clique are NP-complete as well, we would have to show that if we take any input for SAT, then we can transform it into one of those three problems up here and then solve that. Because that shows us that in a way vertex cover or independent SET or clique are at least as hard to solve as SAT. And because SAT is NP-complete, if one of these problems is at least as hard to solve, then that problem must be NP-complete as well. These two down here are the wrong way around because SAT is NP-complete, which means that this is one of the hardest problems that you will find in NP, so just showing that for example, you could take vertex cover and transform it into an input for SAT would only show you that SAT is at least as hard to solve as vertex cover, but that is the wrong way around, because we want to show that vertex cover is at least as hard to solve as SAT, so those two answers down here are not correct.
So what is this mysterious SAT problem? So SAT stands for Boolean satisfiability, of course, that also doesn't tell you very much, but I'll soon get into what Boolean satisfiability is. You have to just note that SAT is one of the most famous and most studied problems in theoretical computer science, and it's the bases of many marked results, so it's always a good idea to familiarize yourself with what SAT is about and why SAT can be so hard to solve. So what is the Boolean satisfiability problem? The input for SAT is a so called Boolean formula, and that actually sounds much more intimidating than it is because Boolean formulas are very simple structures. They consist of only 4 building blocks. As any kind of formula, we have variables, and those variables I'll write in this course usually as X1, X2, X3, and so on. And the nice thing about these variables is so in a normal formula whenever you have X that can take a number of different values, but in a Boolean formula, a variable X can only take 2 different values. It can either be true, which is sometimes also written as a 1 or it can be false, which is sometimes also written as a 0. The second thing you can have in a Boolean formula is an operator called not. And how you write that is, if you say, have a variable X1, then this here would be not X1 or if you have X2, this here would be not X2, and what the not does is it turns a true into a false, and vice versus, so it also turns a false into a true. So for example, if X1 equals true, then not X1 equals false. So it basically just flips that variable. Then you have another operator, and that one is called and. And this and is usually written like this, and it works on 2 variables, so you have X1 and X2 or any other variable. And the way it works is that this expression here, X1 and X2 is true only if X1 is true and X2 is true. Otherwise, it's always false. So I'm going to write this as 1s and 0s here just to make it a bit shorter, but you know that 1 stands for true and 0 stands for false, so it's the same. So 0 and 0 equals 0, and only 1 and 1 equals 1. And the final component of a Boolean formula is another operator called or, and or is written just the other way around, so it's written like a flipped and. So it's X1 or X2, and as the name already suggests, that is set to true or set to 1 if at least 1 of these variables here is set to true, so 0 or 0, that will still be 0, but 0 or 1 will be 1, and 1 or 1 is also going to evaluate to 1. So let's practice this in a little quiz. Let's say you have the following Boolean formula. X1 or not X2 or X3 and not X1 or X2, and we're going to have that X1 is equal to true or 1, X2 is equal to true, and X3 is also set to true or 1. Is this whole formula equal to true or is it equal to false? So please select this button here if you think the whole formula is equal to false or 0, and please select this one here if the whole formula is equal to true or 1.
So the answer here is 1 or true, and you can see that as follows: So X1 is equal to 1, not X2 is equal to 0, so 1 or 0 is equal to 1, so this whole part here is equal to 1, and actually X3 is also equal to 1. So we have that this whole part here is equal to 1, X3 is equal to 1, and 1 or 1 is 1 again, so this whole part here is equal to 1. And over here we have not X1, which is 0 or X2, but X2 is true or 1, and so the whole thing 0 or 1 is equal to 1, and so we have 1 and 1, which evaluates to 1.
So now that you know about Boolean formulas, so what is the SAT problem? The SAT problem has as an input a Boolean formula with N variables, and I'm usually going to write them as X1, X2, and so on until you get to Xn, and the output or question since SAT is a decision problem is the following: And the question is, can you set the variables X1 to Xn to a combination of true or false so that the whole formula that you're given in the input becomes true? And the answer to that can, of course, only be yes or no, so again, SAT is a decision problem. So for example, if we're given a very simple Boolean formula such X1 and X2, and then the answer to SAT would be yes because you could set X1 to true and X2 to true, and then the whole formula would also evaluate to true because you have 1 and 1, which is true or 1. Let's do a little bit more challenging example. If you had a Boolean formula such as this, x1 and X2 or 1 and not X1 or X2, and what you can also see here is that you do not always have to put brackets around each or to signify in which order you evaluate the individual ones, so it's okay to write 2 or 3 or 4 or's or 2 or 3 or 4 and's or even more if you want to without doing those brackets. It really doesn't make a difference in the way that the formula evaluates. But let's come back to SAT. So does this Boolean formula here have a satisfying assignment, meaning can you set the variables to a combination of true and false so that the whole formula becomes true, and here the answer is again yes. X1, unfortunately, doesn't really help here. So if you set X1 to true for example, this would also be true. And here you have not X1, so X1 and not X1, that would evaluate to 0. If you set X1 to 0, then you have it the other way around, and then let's set X2 to 1 because then this whole thing here is 1, this stays at 0 no matter what we do, and this one also goes to 1, and then you have 1 or 0 or 1 so there's at least one 1 in here, and so the whole thing evaluates to 1, and the formula can be satisfied. So now let's see if you can figure out an even more challenging example yourself. So here's our Boolean formula, X1 or X3 and X1 and X2 or not X3, and the formula continues here, not X2 or not X3 and not X1 or not X2, and I'm going reveal to you that this formula indeed has an assignment of true and false to the variables X1, X2, and X3 so that the whole formula is satisfied, meaning it evaluates to true. What I want to know from you is how I have to set the variables to do this. So if I should set X1 to false, please check here. If I should set it to 2, please check here, and the same thing for the variables X2 and X3, please.
And the answer here is that you should set X1 to true, X2 to false, and X3 to false, and the way to figure this out is, at least for me, it was the following: So you have to set X1 to true because otherwise this thing here will be false, and you see we have a number of N's, so if this evaluates to zero, the whole formula cannot be satisfied. So we already know this assignment here, which gives us a 1 here and a 0 here also. Now since we have a 0 here, we must make sure that not X2 evaluates to true, so we have to set it to false, and if we set it to false, we get a 1 here, get a 1 here, and a 0 here. And since this here has been set to 0, and again we need to make sure that these brackets evaluate to true. We also have to set X3 to false so that this here becomes a 1, which gives us a 1 here as well and a 0 here as well. And so for the whole thing, this here evaluates to 1, this here evaluates to 1 because this evaluates to 1, and this one here as well. This goes to 1, and this goes to 1 here as well. And so we have 1 and 1 and 1 and 1, which satisfies the whole formula.
Now if you have paid attention so far, I hope you have noticed that I forgotten to mention one detail about the satisfiability problem so far, and I would like you to tell me what that is. So what do we still need to talk about? The length of the Boolean formula that we're given as an input to SAT? The number of variables or the number of times each variable occurs? So only one of them is true, and I would like you to check the one which is true.
And the answer here is that we need to say something about the length of the formula, because as you know, we are measuring running time as a function of the size of the input. Now if you have N variables, I think it's very convenient to measure the running time as a function of N, but we can only do that if we know that the length of the formula is also some function of N or if the length of the formula is polynomial in N to be more precise. So just to be very precise, from now on we will always assume that the total length of this Boolean formula that we are given is polynomial in N, the number of variables.
So now that you know what the SAT problem is, how do we actually show that SAT is NP-complete? So what was the main ideas that Cook and Levin used to prove this, because it still sounds quite difficult doesn't it? To show that SAT is NP-complete, what we basically have to show is that for any problem in NP there is a polynomial time reduction to SAT, and before we go into the details of this proof, I want to show you the 3 main ideas that are used to show this. So the first one is, well, it's not really an idea, it's the trivial definition of what it means that a problem lies in NP. If a problem lies in NP, that means it can be solved in polynomial time by a non-deterministic RAM. So why do we state this? Going back to the definition actually allows us to make a very useful observation, because if we can solve the problem in polynomial time on a non-deterministic RAM, then that means there must be some polynomial time algorithm. Of course, that algorithm is going to use if-better function because it's running on a non-deterministic RAM, but we know that if a problem is in NP, then even without explicitly having to come up with this algorithm, we know there must be one because otherwise this problem would not be in NP. Which brings us to the third point, and that is the main idea that Cook and Levin had, and that is instead of having to show that any problem in NP can be encoded as a Boolean formula, what they did is they showed that any algorithm can be encoded as a Boolean formula, because if you can show that, with certain constraints of course, but basically the idea is if you can show that any algorithm can be encoded as a Boolean formula, then you can also encode this polynomial time algorithm here as a Boolean formula, which can then be used to solve an NP-complete problem. So if you can encode any algorithm, then you can also encode any algorithm that will solve a problem in NP. And of course, we're going to go into the details, but that was their main observation. They showed that we should not look at the problems in NP, but we should actually look at the algorithms that solve those problems in NP, and we know there must always be such an algorithm because otherwise the problems wouldn't be in NP.