cs313 ยป

cs313 unit 6.2

01 l Dealing with NP Completeness

You now know the basics of dealing with NP-complete problems or NP-hard problems if we're talking about an optimization problem. Designing intelligent algorithms to find an exact solution, or, failing that, accepting approximate solutions, or even incorporating randomness into the algorithms. Are those all of the possible ways that you can deal with an NP-complete problem? Of course not; there are, for example, combinations between approximation algorithms and randomization algorithms. And, of course, there are also algorithms that you can use if you don't demand performance guarantees because the algorithms that we have analyzed, have always provided a certain guarantee as to how they would perform. If you don't require such performance guarantees, there's an arsenal of other techniques out there, known as heuristics. Some concrete examples for that would be evolutionary algorithms, genetic algorithms, simulated annealing, local search, meta-heuristics, and many more. Be warned, though; just because some other algorithms don't have any provable performance guarantees doesn't necessarily mean that they are simpler or easy. Rather, successful implementations of any technique for an NP-complete problem require a lot of thought. And I think the general rule here is finding good solutions to hard problems requires a lot of thought no matter what technique you're using.

02 q What You Know

Now let's take a step back and see what you have learned so far. And the way we're going to organize this is that we're going to draw pictures of so-called complexity classes. And a complexity class is basically a collection of problems that can be solved under given constraints or resources. So you take a bunch of problems, you introduce a constraint such as the one we have seen with NP where we said must be solvable in non-deterministic polynomial time, and then you sort problems. Some problems will fall into the complexity class, other problems will basically just fall outside of the complexity class. And of course you already know some complexity classes very well. So let's do a little warm-up quiz here. So you already know the class P from the previous units. And I just want you to quickly fill in if the complexity class P is the class of all problems that can be solved in polynomial or exponential time on a deterministic or non-deterministic RAM.

02 s What You Know

Of course, you know that answer. P is the class of all problems that can be solved in polynomial time on the deterministic RAM.

03 s What You Know NP

And again, that is probably very familiar to you by now. NP is is the complexity class of all problems that can that can be solved in polynomial time or in non-deterministic RAM. That's what the N here stands for; now you might have found this quiz rather easy, but actually that you're so familiar with P and NP I think it's something special, because I find that not many people actually understand what P and NP stand for, and what the difference is between the two. Now if you really want to impress a theoretical computer scientist, of course you have to be a little bit nit picky. First of all, P and NP formally are defined for decision problems, so here we said all problems, but actually, if you're very formal, it's only for decision problems, and the other thing is that originally, those classes were not defined on the RAM but rather on a computer model known as a Turing machine, named after the computer pioneer, Alan Turing. Now having a precise definition doesn't really change anything about P and NP the way that we were talking about it, but when you get down into the details of these classes, it might actually matter that you have a Turing machine here instead of a RAM and that you're talking about decision problems instead of optimization problems. But for here, I think it's just fine to think of it as the complexity class of all problems that can be solved in polynomial time on a deterministic or non deterministic RAM.

04 q EXPTIME and NEXPTIME

You might remember from the second unit that we can draw a matrix like this. Where here in the columns we have what a deterministic RAM can do, and here we have what a non-deterministic RAM can do. In the rows we have polynomial time and exponential time. You already know 2 complexity classes that go into here. One is P, and the other one is NP. There is actually also names for these classes up here. This one is commonly referred to as EXP or exptime. You can guess what that stands for. This one over here, guess what, is called nexptime. Since we are this far in the unit, here is a little challenging quiz for you. Because you can actually arrange all of these complexity classes here in a sort of hierarchy. Meaning that, for example, all problems that fall into this complexity class here automatically also fall into this complexity class here and this complexity class here. It's basically a hierarchy or containment of the complexity classes. I am going to give all of the complexity classes over here a number. 1, 2, 3, and 4. What I would like you to do is to enter the number of the respective complexity class so that you have built this hierarchy over here.

04 s EXPTIME and NEXPTIME

So each complexity class here follows a number of conditions or has a number of conditions that define it, so P is defined as being the class of all problems that can be solved in polynomial time on a deterministic RAM, and NEXPTIME, for example, is defined as the class of all problems that can be solved in exponential time on a non-deterministic RAM. Now when you want to arrange those classes into a sort of hierarchy, then you always have to think about which conditions are more restrictive than others. And I hope it's clear by now that determinism is more restrictive than non-determinisms. So basically, we can say non-determinism is stronger than determinism. Of course, this is very hand weary, so don't show this to a theoretical computer scientist, but for here it is okay to think about it that way. Now what about exponential time and polynomial time, I think that is a very clear one. So exponential time clearly should give you at least as much power as polynomial time, and this basically gives us the hierarchy, because polynomial time is clearly the most restrictive one, and non-deterministic exponential time is clearly the most powerful one. So we can already say here that P goes into the very inner circle whereas NEXPTIME goes out here. Now what about NP and exponential time on a deterministic machine? When we talked about non-determinism, we already found out that polynomial time non-determinism can always be simulated in exponential time determinism, which means that exponential time on a deterministic machine is at least as powerful as non-deterministic time on a polynomial time machine. So NP goes into the second circle here and EXTIME into the third.

05 q Does EXPTIME equal NEXPTIME

Now let's do a little bonus question here. And this is, again, a rather challenging one, but I think you can figure it out. So exponential time, as you know, is time of O(2) to the power of some polynomial of n, so we'll just write this as p(n). And, of course, p is some polynomial of n, and this here is also a polynomial of n, and this is O(2^p(n)). My question for you is, given that we can simulate nondeterministic polynomial time and deterministic exponential time, does that imply that these two classes here-- EXPTIME and NEXPTIME--are actually the same?

05 s Does EXPTIME equal NEXPTIME

And the answer here is no, we cannot. And the reason is as follows; so when we simulate nondeterminism on a deterministic RAM, what we have is here--a polynomial-- and here we have 2 to the power of that polynomial, which essentially was needed for trying all possibilities for using the "if better" function or nondeterminism, if you will. Now if we tried the same thing over here, what we would have--we would have a nondeterministic machine that runs in 2^p(n) time. Now if you wanted to simulate this on a deterministic RAM, what you could have in the worst case is a running time of 2 ^2^p(n), which does not fall into this category here anymore because we said that exponential time was 2^p(n) and not 2 to the power of some exponential function of n. So you must not make the mistake that simply because we can simulate NP in exponential time, we can also simulate nondeterministic exponential time in exponential time. But given how bad these running times here are, you will, of course, not come across them very often in practice.

06 l Games

Which actually brings me to my next question. Now, we have always said that NP-complete problems are very hard or challenging problems. Are there problems that are even more difficult to solve than NP-complete problems? And the answer here might surprise you. Actually, finding an optimum strategy in most games, such as chess or checkers, turns out to be harder than NP-complete. Of course, when I say chess or checkers, I do not mean the normal 8 x 8 board that you're playing on, but rather its generalized versions of those games where you can have an arbitrarily large board. The reason, of course, being that if you fix this as 8 x 8, then the problem size or input size is always fixed. So a concept like NP-completeness does not apply; you always need an input size. But if you take the dimensions of the board as an input size, then certain questions about these games become provably harder than NP-complete. So the problem of determining for a given game position on an n x n chessboard: if white can force a win, which means that no matter how good black plays, white can always find a strategy to win. Asking that question is believed to be much harder than solving an NP-complete problem. Now, I say "believed to be," because this hardness requires that P does not equal NP. If P equals NP, then, actually, many variants of this question here and their different ones will also be solvable in polynomial time. But in the case that P does not equal NP, this problem here is much harder than NP-complete problems. There are even some problems that are provably harder than NP-complete problems. So even if P were equal to NP, these problems would be harder. And I'll give you 1 example for that.

07 q Roadblock

And the game that I'm going to show you is called "Roadblock." Roadblock is a game for 2 players. It's played on a graph with end vertices where the edges are colored. And coloring--well, in a way, it's similar to Shortest Tour. So in Shortest Tour or Traveling Salesman, each edge had a number or distance assigned to it. And here each edge has a color, so you can have red edges, blue edges, yellow edges, any number of colors that you desire. And each edge has exactly 1 of those colors. Now, some of the vertices also have a label. Some of the vertices have a label that says, "player 1 wins," and others have a label that says, "player 2 wins." So the situation might look something like this. So here's the graph with end vertices, in this case, 11 vertices. The edges are colored either red, purple, or blue, and some of the vertices are labeled. Here we have the 2 vertices that say "player 1 wins," and here we have 2 vertices that say, "player 2 wins." So what are the rules of the game? Both player 1 and player 2 start out with a number of playing pieces that are also specified in the beginning. So each player has a number of pieces at specified vertices. So in this example I'm drawing here, it might look something like this. So player 2 might have a piece here; player 1 might have a piece here. Player 2 gets another piece down here, and player 1 gets another piece up here. So player 1 starts playing, then player 2 moves, then again player 1. So they play in turns, and here are the rules of the game: In any turn, you can take 1 of your pieces and move it along the edges of the graph. But you can only move your piece along edges that have the same colors. So you can choose the color, but you must stay on that color. So, for example, player 1 could move this piece here, either to this vertex down here or this 1 down here or this 1 over here; that would all be fine. Player 2 here, for example, could even travel 2 edges in 1 turn, or 3 edges, But what player 2 could not do is travel 1, 2, 3. Because then the color of the edges would change, along which player 2 travels. Each time you have a move you can change your color, but not during a move. And there is a second rule, and that is you cannot jump over other pieces. So player 1 here, for example, cannot travel along these red edges here because player 2 has a piece here. And that is why it's basically called "Roadblock," because player 2 is blocking this road here for player 1. But, of course, player 1 could go here or here. How do you win this game? That is very simple. Player 1 wins the game if player 1 manages to place 1 of his or her pieces on a vertex that says, "player 1 wins." And player 2, likewise, wins when player 2 places his or her pieces on 1 of the vertices that say, "player 2 wins." It can be any graph, any combination of labels of the vertices, any edges, any color combinations of the edges. So you're given this configuration here as an input. The question is, can player 1 always win? Or, in other words, can player 1 'force' a win? And what I mean by that "is player 1 being able to force a win," means that no matter how smart player 2 plays, what moves he or she makes, how intelligent and whatnot, player 1 will always be able to make moves that will allow player 1 to win. So given this graph up here, can player 1 always win?

07 s Roadblock

And the answer here is yes; player 1 can always win this game here that I gave you. And 1 strategy for player 1 to do this, for example, is to move this piece over here to here. Now it's player 2's turn, and player 1 is now threatening to win in the next move by moving this piece here to this piece here. Now, player 2 could try and block this vertex here, but player 2 can only do this with this piece here. If player 2 were to move his or her piece over here, then player 1, in the next move could, however, move this piece here to that vertex. So no matter what player 2 does, player 1 will win in the next move by either moving this piece here to that vertex, or being able to move this piece here to that vertex. Player 2 could also use this piece here to block this vertex down here, but that would still keep this 1 here free, and player 2 also cannot reach any of the vertices that would let player 2 win. So for this initial configuration here, player 1 can always force a win. Now, if you figured out that quiz yourself, you should be very, very proud of yourself. Asking this question: "Can player 1 force a win?" for a given instance of Roadblock on a graph with end vertices is provably harder than any problem in NP. Which means that this problem requires exponential time to solve, even if P were equal to NP. So even if you showed this, or had a nondeterministic gram available to solve this problem, you would still require exponential time. And that is why Roadblock is 1 example for a game, or asking a question about a game, that is harder to solve than anything in NP.

08 l Time for Roadblock

So the Roadblock problem is know to be exponential-time complete, which means it's provably harder than any problem in NP even on a nondeterministic machine. And the proof does not depend on whether P=NP or not. The proof is certain; Roadblock is harder than any problem in NP. So it's harder than, say, vertex cover. Now why is that? Why can't nondeterminism help us to solve Roadblock? Well, it can help us, but it apparently can't help us enough to make this problem solvable in NP. So there's several ways to explain this, and 1 of the differences between NP and problems that are harder than NP is that in NP, or, more concretely, in a problem like vertex cover, you're dealing with a static input. You're given 1 graph, and for that graph you're trying to solve 1 question. And that question is: Do you have a vertex cover of a certain size? Asking for an optimal strategy in a game like Roadblock is a bit different because this game is played against a player. So the question kind of is: After player 1 has made a move, what can player 2 do? And player 2, of course, has a number of possible reactions to what player 1 has done. You need to take all of these possibilities into consideration. How player 1 makes the second move or the third move always depends on what player 2 is doing in between. And that is why this problem here is intuitively exponential-time complete, because after every move that player 1 makes, you have to consider all of the possibilities that player 2 makes. And there's no way around that, instead of here, where we can just guess a small vertex cover. There's a description for this that is a little bit more formal. You can say that NP is kind of the set of problems that have a short proof. And by short, I mean a polynomial-time verifiable proof. So, for example, for vertex cover, when I tell you I have a vertex cover that only uses 5 vertices, and you ask me, please show me that vertex cover, well, I can just specify those 5 vertices to you, and you can check yourself that they will make up a vertex cover for the input graph. Now, for a game like Roadblock, that is not that easy, because if you said, "Player 1 can always win this game," and I asked you--I wanted proof of that, then we again get into this situation that the proof of the optimal strategy is exponential in size. Because that proof will basically start out by saying, okay, this is the first move that player 1 should make, but then there are a couple of reactions that player 2 could have to that move. So, let's say it branches out; so this is what player 1 is doing, and this is what player 2 is doing. Now, for each of these reactions of player 2, I will again in this proof have to specify what player 1 is supposed to do. But then, as you see, it branches out. Player 2 again has a number of possible reactions that it can have to this move. For Roadblock, if you think of this as a large search tree, it doesn't suffice to show that there's just 1 single path that leads to a solution, but basically you have to show that any path here in this tree always leads to player 1 winning. So for vertex cover or any other problem in NP and, of course, this is again talking very informally, it suffices to show 1 path in the search tree. Whereas for a problem like Roadblock, the proof that, for example, player 1 can always win, would require you to show the whole search tree and not just a single path, and that is why problems such as Roadblock are provably harder than problems in NP.

09 q Hierachies

Now let's get back to the general complexity classes. So we have already drawn this picture here--the hierarchy of P versus NP, versus EXPTIME, versus NEXPTIME. And, of course, there are other complexity classes that we have gotten to know in this course. So there's the class of fixed-parameter tractable problems. Then we looked at problems for which there are or are not approximation algorithms. So we had problems for which there was a PTAS, we had some problems for which there was a constant factor approximation, and we also mentioned that there are problems such as clique and independent set for which there's no constant factor approximation unless P=NP. And then, finally, we looked at randomized algorithms where we said there were Monte Carlo algorithms--those were the problems that were guaranteed to run in polynomial time but only with a certain probability give us a correct solution. And then we also mentioned Las Vegas algorithms which are algorithms that guarantee us a correct solution, but the running time is only polynomial with a certain given probability. Of course, we can also start building hierarchies here, so we can do a hierarchy for approximation, and we can do a hierarchy for randomized algorithms. And guess what--this is going to be our next quiz. So as in the quiz before, when we talked about P, NP, EXPTIME, and NEXPTIME, I'm, again, giving numbers to these complexity classes here, and, again, I would like you to enter the numbers here so that we form a nice hierarchy.

09 s Hierachies

I suppose that wasn't too difficult for you. PTAS clearly is the most powerful approximate ability, because with PTAS you can get any constant factor approximation that you want. It is the smallest bubble in here. Then we have the constant factor algorithms. So these here. Then finally we have all of those problems for which there's neither a constant-factor algorithm nor a PTAS up here. We already briefly mentioned that any Las Vegas algorithm can be turned into a Monte Carlo algorithm, and for that reason the complexity class of all problems for which there are Monte Carlo randomized algorithms, contains the complexity class of all problems for which there are Las Vegas algorithms.

10 q Problems in NP

Now, if we put all of this knowledge together and just look at the problems that are in NP then there's a lot of different possibilities for the properties of that problem. So either the problem if it's an NP could be in P or this NP complete or it might even be something else. Now, we haven't really talked about this something else here, but they are some problems, which are believed to be somewhere in between P and NP at least if P does not equal NP, of course. One example here, for example, would be factoring two large numbers. Now, if your problem is NP complete, there are still very very many possibilities. Your problem could be fixed-parameter tractable. Your problem could have an approximation algorithm and that approximation algorithm might either be a PTAS, a constant-factor approximation algorithm, or also something else, and again, we haven't really talked about this something else, but for example, there are some approximation algorithms where you can have a logarithmic approximation factor and then there are randomized algorithms, Monte Carlo algorithms and Las Vegas algorithms, and there are many many other things that we haven't touched upon so really a lot of different complexities. Now, one final thing I should mention, they are two notions here especially if you talked to somebody who has studied theoretical computer science. They will nitpick you about the NP completeness, because NP completeness is technically only defined for decision problems. So if you're talking about optimization problems and you want to be very correct then you should call that problem an NP hard problem. But many people use this term interchangeably, and actually, when you are working in practice on hard problems, it doesn't really make much of a difference at least in my opinion, but it is not the most precise way to say it so decision problems, NP complete, optimization problems, NP hard. What would happen if we were able to show that P equals NP? Would then each single class we have shown here become P? Or would the classes separate? And what I mean by this is could be then showed that not every NP complete problem is fixed-parameter tractable or not every NP complete problem has an approximation algorithm. And more than one of these here can be correct.

10 s Problems in NP

And there are two correct answers here. So the first one is if we were able to show that P=NP, well then, each single class here would lie in P because these are all problems that lie in NP as we said. So, all of these classes here would become P, which means that if we were indeed able to show that P=NP, then this whole picture here could just be written like this.

11 l The Complexity Zoo

Now, if the image of complexity classes that are just true look a bit complicated to you, I would have to say you haven't seen anything yet, because, of course, they are many many more complexity classes out there. Basically, any type of machine and restriction that you can think of forms a complexity class. And a nice collection of this, if you're interested in that, is the complexity zoo initiated by Scott Aaronson, a theoretical computer scientist at the MIT, and here we have a nice formatting of this document as a PDF, but it is also a website. So Google the complexity zoo. It's really interesting to see. And as you can see here in the glossary, there are many many many more complexity classes than ones that we talked about. So let's just see if we can find those that we already know. Here's FPT. This is the class of fixed-parameter tractable problems. Here is P. Down here somewhere is NP, but you see it's already difficult to find even those classes that we know, because they are so many different other ones. And they have lots of different origins, so there are some classes that are related to approximation, some classes that are related to randomization, and many classes here are also related to special models of computation that we haven't had the time to cover in this course. So there are some models of computation that work with circuits for example or even more exotic models such as quantum computing. The bad news about all of these classes here or all of the classes here in this document is that we don't really know much about the relationships between them. We know some, but as you have seen with P versus NP, there are lots of open questions here, which I think is also a good news for you. There are still lots and lots and lots of things to explore in computational complexity.

12 q Properties of our Problems

Now, I don't want you to feel intimidated by all of these complexity classes rather you should be proud of yourself and I would say congratulations to you for having learned so much in this course about computational complexity where you met Alice, Bob, and Carol and later, we also met Dave We didn't know anything about their problems. We didn't know if there was a good algorithm for vertex cover or if we just couldn't find one. Similar for clique and independent set and shortest tour. Later, we also learned about SAT and then we started out with the concept of NP completeness, which suggested that many of these problems here are hard. Then we went on to search trees to show how some of these problems nevertheless can be solve efficiently, and we introduced the concept of pre-processing. We showed that some problems are fixed using parameter tractable using either the size of the solution or some parameter such as distance from triviality as a parameter. We talked about constant factor approximation algorithms, the concept of a PTAS, problems for which there's likely no constant factor approximation algorithm unless P=NP. We talked about Monte Carlo randomized algorithms. We talked about Las Vegas randomized algorithms. And we talked about which of these problems, although worst-case analysis might suggest otherwise are usually very well solvable in practice. This is my final quiz for you in this unit. What I would like you to do is think about all the things you now know about these problems what you know about vertex cover, what you know about clique and independent set, about shortest tour, and 3-SAT and then just make check marks here on what you know.

12 s Properties of our Problems

And I think you'll agree that you now know a lot. So you know all of these problems here to be NP complete, which you probably didn't know before taking this course. We've also investigated intelligent search trees for vertex cover and for clique and independent set. We, unfortunately, were not able to find one for shortest tour, but we know one for 3-SAT. We talked about pre-processing for vertex cover, and we talked about pre-processing for 3-SAT. And we showed that vertex cover is fixed parameter tractable using the size of the solution, and we showed that shortest tour could be fixed parameter tractable as well as 3-SAT using a measure known as distance from triviality. We also talked about the constant-factor approximation algorithm for vertex cover. We showed that there is no such algorithm for clique and independent set. We had a constant-factor approximation algorithm for shortest tour. We did not touch upon 3-SAT unfortunately here. Regarding a PTAS, we know there's non-available for vertex cover. We also know there is non-available for clique and independent set unless P equals NP. We mentioned briefly that there might be some available for special cases of shortest tour but not in the general case also not for 3-SAT. Then we have the Monte Carlo algorithm for 3-SAT. We did not go much into Las Vegas algorithms here, but what we then also did we mentioned that all of these problems here are usually much better solvable in practice than the worst-case analysis and their NP completeness suggests. Now compare this table of knowledge with what the four computer scientists and you knew before taking this course. And I think you now know a lot about challenging problems but also about the techniques that you can use to solve them. So it's of course not all good news, because in general, these problems are hard, but using this arsenal of techniques here, you'll usually be able to do something about them in practice as others halve before you, and I think that can really really really make you stand out among other computer scientists because not many computer scientists understand the concept of NP completeness alone, but even those that do once they encounter an NP complete problem, they will tend to just give up or use some random algorithm, not a randomized one, a random one where as you now know about these techniques here and can use them to try and tackle those problems.

13 l Goodbye Alice Bob Carol and Dave

So, I think our four computer scientist can be rather optimistic and that is also why they are all smiling a little bit here. So, now, it's time to say goodbye to Alice, Bob, Carol, and Dave and wish them all the best for their jobs because in the next unit we are going to take a look at problems that are far worse that anything you have encountered so far. In the final unit of this course, we are going to look at problems that are worse than NP complete problems, worse than games such as road block, and worse than exponential time complete. We are going to explore the very limits of computation and look at problems that no computer can solve ever, so see you next time.