cs313 »

So now that you've learned about search trees and pre-processing, you already know quite a lot about the loopholes that we can find in the laws of NP-completeness in order to actually tackle an NP-complete problem, because now you can get from algorithms that would normally take 2 to the power of N time down to other algorithms that say take which already makes many more instances solvable. But to conclude this unit, I want to show you another loophole in the laws of NP-completeness that is sometimes very, very useful and that actually not very many people know about. So I'm going to state another law from the book of NP-completeness, and that law in which we will soon poke loopholes is larger instances are harder to solve. And that is actually quite obvious because we said that we were going to look at worse case time and always measure the running time of an algorithm as a function of the input size. Say we have an algorithm that runs in O of 2 to the power of N times N squared time. What this law here means is bigger and more time in worst-case scenario. So if you have a running time like this, a bigger input size means more running time required for the algorithm in a worst-case scenario. Let's talk a little bit about this N here, the size of the input, because this is often a very coarse estimate of how hard an input is actually to solve, and I'm going to give you one example of this or actually I'm going to quiz you about this. Given the search trees and the pre-processing, which of the following instances of vertex cover is easier to solve? And I know this is an easy one. In case you're wondering, the number of vertices in each of the graphs here is the same.

Vertex cover is so much easier to solve for this graph over here. It would actually be solved by pre-processing alone, although both graphs have the same number of vertices. It seems like the size of the input is actually not the best way to measure how hard it is to solve an NP complete problem for that input. It seems kind of unfair to just say to this graph here--well, you're just as large as this one over here, so we're going to assume that you're just as hard to solve.

And that is why you could think of hardness, meaning how hard is a specific input to solve as a combination of 2 things. One is, I think, the size of the input, and we always measure this as a number N. But the other thing is the structure of the input. In a way, it's not larger instances are harder to solve, but in a way you could say, harder instances are harder to solve. This of course might seem like a bit nitpicking around, so the question is, is this actually useful, and the answer is yes. For some problems, you can actually express the structure of the input, and I'm soon going to show you how to do that, as a number K. And while in classical NP-completeness, so to say, you would measure hardness as some function of N. You can then measure hardness as a function of 2 parameters. One parameter is still the size of the input, and the other parameter is going to specify something about the structure of the input. Now this might sound a bit abstract, so what could this parameter be? And I'm going to show you 2 possibilities for this parameter, and we're going to discuss them in detail. One is size of the solution, and the other one is something I would call distance from triviality. So for example, for the network from the last quiz, you could actually see that both parameters could tell us that finding a solution here is easy, because first of all, the solution here and optimum vertex cover is very small, you only require 1 vertex. And also, you could argue that this graph here is very trivial because all you need is a pre-processing rule to actually find the best possible solution. But again, we'll get into more detail. Let's start out with the size of the solution.

And now we already know that we can solve vertex cover using a search tree. So we started with the original graph, and I'm now going to use again the search tree where we branch into only 3 different possibilities. We know it's not the best possible one, but it will be easier to understand the rest this way, but it also works for the more complicated search trees. So then on the next level, we again branched into And we said that the search tree had a size of N half and the total number of solutions that we looked at in the search tree was 3 to the power of N half, which as you remember was about 1.734 to the power of N. Now I would like you to quickly think about the following: So as you'll remember, at each level of the search tree, we're looking at an edge where the 2 end points have not yet had an assignment, and we're branching into 3 cases, this one, this one, and this one. So now it's a very easy quiz actually. So how many vertices are added each time we go one level deeper into the tree? At least 1, at least 2, exactly 1, exactly 2, or is this something that actually depends on the structure of the input graph?

And the answer here of course is that at least one vertex is added to the potential vertex cover each time we go down here in this tree or we branched because there's only three possibilities and in this possibility here, we add one vertex, and this possibility here, we also add one vertex, and here we will add too. And of course, there then comes data reduction and all that jazz, but we're always adding at least one.

So what happens as we go down here, at least one vertex added and so on. So what does this have to do with measuring the hardness of vertex cover? Well, we'll get there in a minute. What I would like you to think about now is the following: Let's say we were looking at the decision variant of vertex cover where as input we were given a graph and an integer K, and our output was an answer to the question, does the graph have a vertex cover of size at most K? So now here comes my next question to you, what is the maximum height of the search tree if we are not using it to solve the optimization version of vertex cover but the decision version of vertex cover? Would that height be in half, would that height be K, would that height be N, or something we can't say? And please choose the best possible one here.

And now this is the interesting insight here. The maximum height of the search tree is K Why is that? Well, we just noticed that each time we go 1 level deeper in the search tree, we add at least 1 vertex to the vertex cover. If the graph is supposed to have a vertex cover of size at most K, then the maximum number of levels that we can go down here is actually K levels, so for the decision variant, what we have here is that the height of the search tree is no longer in half, but it's actually bounded by K, and K is the size of the vertex cover that we're looking for.

Now my next question to you is what does this mean for the number of assignments that we're looking for in this new search tree? Is that still 3^n/2, is it k^n/2, is it 3^k/2, or is it 3^k?

What you can easily see is that the size of this search tree here is actually 3^k because we branched into three possibilities each time we go a little deeper, and this can happen at most k times. The size of the search tree is 3^k.

Of course, this is only a very coarse analysis actually so you can have much better search trees and even for this tree here, you can do a better analysis because actually, this tree doesn't always have height K for the 3 cases that you branch in. In one of the cases, as you'll remember, you put 2 vertices into the vertex cover, so this one here, this one here, and here you put 2. Actually, that already decreases the size of the tree, so doing this type of analysis is out of the scope of this course actually. If you were to do this, then you find out that the size of the tree is approximately 2.4 to the power of K. And again, there are even better search trees for this. The main point however is this, look at what we have just done. We have just discovered a search tree for a vertex cover for which the size does not depend on the size of the input at all. The size of this search tree only depends on the integer K. It doesn't matter if we do it for a graph with 10 vertices, 100 vertices, or 1,000 vertices, the size of the search tree only depends on K, and it gets better if we implement this as an algorithm to solve vertex cover, so the decision version of vertex cover. Now look at this running time here and what that means. The running time here will be depending on how you implement it, but into the power of 3 will do here.

The decision variant of vertex cover can be solved in 2.4^k*n³ time and this is by far*
not the best algorithm for vertex cover there is and I'm soon going to tell you a little bit about that.
Even with this, think about what that means and I'm actually going to quiz you on this one.
Does it mean that for any fixed k, vertex cover can be solved in polynomial time?
Does it mean that the running time is independent of the size of the input graph?
Or have I been tricking you all along and it's actually the case that P=NP?
Please check all of those that are correct.

Of course, this one here is correct. So if we say we want to find out if a graph has a vertex cover of size 10, then all we need is polynomial time because then this decision problem can be solved in O of 2.4 to the power of 10 times N cubes time, and the exponential factor here is no longer dependent on the input size. So for a fixed value of K, vertex cover can indeed be solved in polynomial time. Of course, the running time is not independent of the size of the input graph because you still have this polynomial factor here. Does this statement here mean that actually P=NP? Unfortunately, it doesn't because K can take any value, so K can also depend on N in the general case, but what it does mean is that if we consider any fixed value of K, then vertex cover becomes easy. In other words, small vertex covers are easy to find.

Now, so far, we've only worked with the decision variant of vertex cover.
So, what about the optimization problem? Actually, that's not that difficult.
We just run through all values of K. So we will first start off assuming K=1.
Then our algorithm requires O(2.4¹*n³), then we'll increase K=2, algorithm requires o(2.4² n³).*
So each time we're solving the decision problem here,
and we increase K even further to K=3 and so on and so on and so on.
Until we hit some value k’ which is the size of the smallest vertex cover of the input graph.
Of course, we don't know what that size will be in advanced?
But once we have increased K enough, solving the decision variant will give us the answer, yes.
Now, how much running time is this in total?
Well, you need to know a bit of discrete mathematics for this,
but running time O(2.4¹+2.4²+)...(2.4^k')

So now coming back to this picture here, you see what I meant when I said that hardness of a problem can be measured as both a function of the size of the imput and the structure of the imput, so in the case of the vertex covered, it turns out that what makes solving vertex cover hard is actually the size of the vertex cover itself. So if there's a small vertex cover, then the problem is easy and if it's large, then the problem becomes hard. Now of course, this immediately raises a question for the book of the laws of NP-completeness and that question is, is every problem fixed-parameter tractable? That depends, it depends on what you consider as a parameter, and one parameter I will soon introduce to you as well is a parameter called distance from triviality, but unfortunately in general, not every NP-complete problem seems to be fixed-parameter tractable. So for vertex cover, we have just shown that if you use the size of the solution as a parameter, then yes, this problem is fix-parameter tractable using K, and by K I mean the size of the solution of course. Now what about clique and independent set? And again, we can consider these 2 problems together because they're so similar. And here unfortunately, I have a bit of bad news for you because using K, by which I mean the size of the solution, that's highly unlikely to be fixed-parameter tractable. So by highly unlikely, what I mean is it would not be something as similar as P=NP or something like that, but still there is evidence in a way that is similar to P versus NP that clique and independent set are not fixed-parameter tractable at least if you consider the parameter to be the size of an optimal solution. Now what about shortest tour? Well, one possible parameter here would be using D, the size of a shortest tour, and here again, it seems somewhat unlikely that the problem is fixed-parameter tractable, but for shortest tour, another parameter has been investigated that actually seems rather promising, and that parameter is called using distance from triviality, and there it turns out shortest tour actually could be fixed-parameter tractable. Now what is distance from triviality?

Of course, we'll do a little quiz to have you figure that out. What I would like you think about is the following: I'm going to give you 3 graphs, and I'm going to ask you for which of these graphs is shortest toward the hardest and the easiest to solve. So here you have 3 graphs, and each of these graphs has the same number of vertices in case you're wondering, or at least I hope it does, I counted twice, but you never know. And what I want I would like you to tell me now is for which of these graphs you think shortest tour is the hardest to solve and for which of these graphs you think is the easiest to solve and I've not written any distances on the edges, but you can basically assume that I would be able to write any distances that I want on these edges. It's more of a question about the structure of these graphs, and for the one that you think is hardest, I want you to enter the number 3 in one of these boxes, and for the one that you think is easiest, I want you to enter the number 1, and guess what number I want you to enter for the one that is in between. So please, go ahead.

Now again, of course, this is a bit debatable, but I think you will all agree that this one here is actually super easy to solve shortest path on, because let's say you start here, this is your mail station where you start. You go all the way here and then you have no other choice but to go here, because you have to visit all those vertices, and then you go back. so there's not really difficult choices that you have to make when you solve shortest path for this one here. Now this one here is a little bit harder. Let's say it doesn't really matter where you start out because for a tour you can start out at any point because you have to come back to it anyway. And let's say you start out here and then you go up here and then you already have your first choice to make, so do you go this way or do you go that way? Well, probably you want to go that way, but actually there are certain distances where you might want to go this way, so you have to make a choice here. Let's say we go this way, and here we have to make a choice again, so do we go this way and back here or do we go this way and here? So a little bit harder to solve because we have to make certain choices, but still not quite as hard as this one over here, which I would think is the hardest because say we start out here, oh, first choice to make, this way or that way, well, that way. Oh, again, new choice, this way or that way, let's go that way. So all the time we have to make choices, which city to visit next, and this is really tough, so what does this have to do with what I just call distance from triviality? So what does with what I just call distance from triviality? If I mark the vertices in each of the networks, if I mark the vertices in each of the graphs that has more than 2 neighbors, then here we have just 1, here we already have a couple of them, and this one also although this one is fairly easy, but let's be consistent here, and here I have a lot of them, and I'll give you one additional graph, which looks something like this. Now this has a bit fewer vertices, but that's not the point here. The point is that this graph down here doesn't have any vertices that I would color because all vertices have exactly 2 neighbors, and this here is what I would call a trivial instance for shortest path, because you don't really have to make any choices where you want to go. Here you also don't have to make any choices, but what you see here is that the number of vertices that have more than 2 neighbors actually is an indicator of how hard it is to solve shortest tour on that respective graph. If you have a few of these blue vertices here that have more than 2 neighbors, then it's easier to solve shortest tour. Now I cannot give you the precise algorithm for this here because actually that is a bit more complicated to figure out and doesn't only have to do with these vertices here, but I think you get the intuition that we could try to use the number of these blue vertices or some additional structural information as a parameter, and it turns out that if you do this the right way, then shortest tour is somewhat fixed-parameter tractable.

Now let's do a final quiz here using again distance from triviality so that you have a chance to understand this concept a bit better. Let's consider 3-SAT this time. You already know that 3-SAT is NP-complete, and you 2-SAT is contained in P, so 2-SAT is polynomial time solvable. So let's consider 2 Boolean formulas here. This is the first one here, x1 or not x2 or not x3 and x1 or not x5 or x6 and x2 or x5 or x7 and x3 or not x6 or not x7. Now it's not really difficult to see if either of these 2 formulas has a satisfying assignment or not, and that's not really the point. What I would like you to think about is the following: Considering x1, x2, and x3, which of these Boolean formulas is closer to triviality, and by triviality I mean which one of these is closer to a 2-SAT formula. So look at x1, x2, and x3 and consider what would happen if you were to assign them true or false values. And then you just check the appropriate box.

And I think the correct answer here is this one above. Again, it's a bit subjective using these small examples, but otherwise it's just too complicated for you to see and explain. So my point here is basically this, if you have decided what you're going to do with x1, x2, and x3, then the following thing happens. You've already decided what you're going to do with x1, you've already decided what you're going to do with x2, and you've already decided what you're going to do with x3, and in that case, what remains is an instance of 2-SAT, and that is solvable in polynomial time. Of course this is a very simple example, so you could have already figured out how to solve it. So there's many different ways to solve it, but I just wanted to show you what I mean by distance of a 3-SAT instance to a 2-SAT instance, because over here you decide what to do with these values. Well, you have it figured out for this clause here, you have it figured out for this one here, but you have not figured it out for that one here. So in a way, the above formula is closer to a polynomial-time solvable instance of SAT than the one down here. But of course you would need a really, really large and tedious Boolean formula to actually take advantage of this. But in fact, many professional SAT solvers or SAT solving packages actually make use of this fact. There are a number of variants of SAT that are known to be solvable in polynomial time, and what these solvers do is they will try to figure out if the Boolean formula that they are given is actually close to one of those polynomial-time solvable instances. Actually, I don't even know if somebody has considered this as a parameter for fixed-parameter tractability, so there might actually be an open research problem for you here if you're ready to get into it.

Congratulations! You have now learned a lot about solving NP complete problems in practice. You have learned about search tree optimizations, you have learned about pre-processing, and you have learned about other techniques such as fixed parameter tractability, and this is huge. Now, of course, in this unit, I could only teach you the very basic techniques that I use. So you might be wondering, what is the state of the art for solving NP-complete problems in practice? I would like to give you a few examples and numbers. For vertex cover on practically relevant inputs, you can sometimes find the vertex cover progress with over 10,000 vertices for independent set and clique. Independent sets and cliques have been calculated for many graphs with over 1000 vertices and traveling salesman which is very close to shortest tour and actually sometimes people also solve shortest tour, so a traveling salesman, you really have to be specific what kind of problem your are talking about. But there's one example where somebody solved an instance with ?85,000 vertices. Think about what this means for the four computer scientists that we met throughout this course so far as started out with not being able to solve vertex cover for a graph with 10 or 20 vertices and then she learned about NP completeness. Her situation was actually quite dire, but then we showed her that there's search tree optimization and pre-processing and if those techniques are applied in the right way, then, Alice can't certainly expect to be able to solve her problem if she is working with 500 or 1000 vertices, because the thing is this, the instances for vertex cover that come up in practice are actually much easier to solve than worse case instances. One example is this, you're practical instances went up to 10,000 vertices. I think Alice can be pretty happy. What about Bob and Carol? Bob and Carol, of course, they started out being as unhappy as Alice because they also had impractical algorithms and then they found out about NP completeness. Of course, they can also use search trees and pre-processing, but their problems turn out to be harder to solve than vertex cover in general. Even so, I think they should be a little bit happy because for the problems that they try to solve in Genetics and in Finance, I think being able to work with about 1000 vertices should be okay. And again, we have to assume that, of course, they are working with inputs that are more easily solvable than worse case inputs but usually in practice, again, that's the case. What about Dave, who we met very late in this course? Well, when we met Dave, we showed a very complex proof that his problem was NP-complete, so he started out very unhappy as well, I would assume. And actually I think he stayed unhappy throughout this unit because whenever we came across shortest tour or traveling salesman, we always had to say, well, we're not really quite sure on how to do this. This is an open research problem. But then again, since this problem is so highly relevant in practice, there have been lots of developments regarding sophisticated algorithms that work very well in practice, so we can basically put it like this. From a theoretical perspective, Dave has to be very very unhappy, but if he uses sophisticated algorithms, then actually Dave can be the happiest of all because the algorithms that have been developed for traveling salesman work extremely well in practice. And finally, of course, we should also say a little bit about SAT. For the satisfiability problem, there are annual competitions and these competitions, of course, are based on mostly on instances that occur in practice than on instances that are deliberately hard. And yet so, each year, usually instances are solved with over 1000 variables. Now SAT is a problem that behaves very differently. So there are some instances of SAT even with 10,000 or 100,000 variables that are very very easy to solve because pre-processing is very effective for these instances. And then there are other instances for SAT that just have a few 100 variables that are super super hard to solve, so there, you usually need to try a lot of different packages and techniques, but nevertheless, when you come across an instance of SAT and there are certain practical situations but this is the case, then you shouldn't despair. SAT, again, if you're working with real world instances, for SAT, it's usually worth a try to solve it.

So congratulations, you have now mastered a basic survival course for NP-completeness. You have learned about advanced algorithmic techniques such as intelligent search trees, pre-processing, and other advanced techniques such as fixed-parameter tractability that will help you solve and P-complete problems in practice. Now this material is very deep. Not many people, at least in my experience, even understand with NP-completeness means, but even fewer people know about these tools and techniques that you can use to poke holes or find loopholes in the laws of NP-completeness. So knowing these techniques and possibly learning more about them can really make you stand out whenever NP-complete problems need to be solved. Of course, even though these techniques are very powerful, sometimes it's just not possible in practice to solve an NP-complete problem. As I hope you're expecting by now, we still won't give up. Rather, in the next unit, we will see if it helps us when we relax a little bit, and instead of looking for the best possible solution, are content with solutions that are not quite optimal. In other words, next time we'll get sloppy.

Vertex cover is so much easier to solve for this graph over here. It would actually be solved by pre-processing alone, although both graphs have the same number of vertices. It seems like the size of the input is actually not the best way to measure how hard it is to solve an NP complete problem for that input. It seems kind of unfair to just say to this graph here--well, you're just as large as this one over here, so we're going to assume that you're just as hard to solve.

So now that you've learned about search trees and pre-processing, you already know quite a lot about the loopholes that we can find in the laws of NP-completeness in order to actually tackle an NP-complete problem, because now you can get from algorithms that would normally take 2 to the power of N time down to other algorithms that say take which already makes many more instances solvable. But to conclude this unit, I want to show you another loophole in the laws of NP-completeness that is sometimes very, very useful and that actually not very many people know about. So I'm going to state another law from the book of NP-completeness, and that law in which we will soon poke loopholes is larger instances are harder to solve. And that is actually quite obvious because we said that we were going to look at worse case time and always measure the running time of an algorithm as a function of the input size. Say we have an algorithm that runs in O of 2 to the power of N times N squared time. What this law here means is bigger and more time in worst-case scenario. So if you have a running time like this, a bigger input size means more running time required for the algorithm in a worst-case scenario. Let's talk a little bit about this N here, the size of the input, because this is often a very coarse estimate of how hard an input is actually to solve, and I'm going to give you one example of this or actually I'm going to quiz you about this. Given the search trees and the pre-processing, which of the following instances of vertex cover is easier to solve? And I know this is an easy one. In case you're wondering, the number of vertices in each of the graphs here is the same.