cs313 »

So welcome to Unit 2. In the last unit we encountered three computer scientists as I hope you'll remember, Alice, Bob, and Carol. And as you'll recall, all three were working on rather tough problems for which they weren't able to find an efficient algorithm. Alice was working in telecommunications, and her task was for the network of her employer to find the minimum number of monitoring devices that needed to be installed to monitor all cables in the telecommunications network. Bob was working in bioinformatics, and his job was to find clusters of data in genetic analysis. And finally, Carol was working in finance and she has been tasked with selecting a portfolio of assets that consists of as many assets as possible but it has to observe a number of restrictions, which assets she can put together in one portfolio. Now, at the end of the last unit, we stated these problems as problems on graphs and we're going to do a little recall quiz here to see if you remember which problems they were working on when we stated this as a graph problem. The three problems we're looking at were called independent set, clique, and vertex cover and all of those three problems took as input a graph with n vertices. For independent set, you were then looking for the maximum number of vertices where none of those vertices you have selected are connected to each other. For clique, you were looking for a maximum number of vertices where all of them are connected. And for vertex cover, we were looking for a minimum number of vertices to cover all edges meaning that every edge must have at least one endpoint in the set of vertices that we selected. So here's our first quiz and I would like to see if you remember which of these computer scientists, so Alice working in telecommunications, Bob working in bioinformatics, and Carol working in finance, which of these computer scientists were working on which of these problems here? And they way I want you to answer this is for each of the problems just select if that was the problem that Alice was working on, or the problem that Bob was working on, or the problem that Carol was working on. And the same here for clique and vertex cover.

And the correct here is that Carol was working on independent sets because she had to find a number of investment opportunities that were independent of each other. Bob was working on clique because he was trying to find cluster of genes that are showing similar expression patterns or similar behaviors, and Alice finally was working on vertex cover because she was trying to find a way that her telecommunications company could monitor all of their network as efficiently as possible.

The question that we are, for that matter, Alice, Bob, and Carol were stuck with at the end of the previous unit was that they had only been able to find exponential time algorithms for these 3 problems here, and those algorithms did not show any acceptable running time. So the question they were stuck with is, did they just not think hard enough to find good algorithms for their problems, or are their problems intractable, which would mean that it's not even possible or at least highly unlikely to find a polynomial time algorithm for those problems? And this is what we'll investigate further in this unit. And I'm not yet going to reveal the answer to you but what I am going to tell you is that without realizing it by asking this question, if their problems are tractable or intractable, Alice, Bob, and Carol are currently pondering what is by many seen as one of the toughest questions in computer science-- the famous P vs. NP problem. And in this unit, I want to show you 4 things. First, what these letters mean. Second, what the problem is actually all about. So what this here means. Third, what Alice, Bob, and Carol have actually to do with a P vs. NP problem. And then, last but not the least, how you could become an instant millionaire. That's not bad for one unit, is it?

We're going to start off this unit with a small technical detail. When we considered our three problems, vertex cover, independent set, and clique, we always said that we were looking for the best possible solution. So, for example, for the largest independent set or the smallest vertex cover. This type of problem is known as an optimization problem because we're trying to maximize or minimize some value. In this unit, we're mostly going to work with a slightly different version of these problems called decision problems. So we're not going to ask, for example, what is the smallest possible vertex cover? But rather we're going to ask, does a graph have a vertex cover that is smaller than some number of k? And k is given to us in advance. So the main difference between an optimization problem and a decision problem is that for an optimization problem, we ask, find us the best possible solution? Or what's the best possible value that we can achieve? And in a decision problem, we have to ask, is it possible to achieve a value of k? And so up here the answer will be some number. And down here, the answer can only be yes or no. And there are two main reasons why we do this. First of all, it makes our lives much easier in some of the proofs that we're going to dive into. And secondly, it's also a little bit more accurate because when you talk about the P versus NP problem, although it's often stated for optimization problems such as those we have so far discussed, the whole theory has actually been developed for decision problems. Now, you're probably thinking what difference does it make to talk about an optimization problem versus a decision problem? Or is this going to make any significant difference, for example, with the question of tractability versus intractability and actually it doesn't really make that much of a difference, but I'll actually have you figure out the details here in our next quiz. And I would like you to think about four things and then give me your answer. So the first question is, if the optimization version of a problem turns out to be tractable, then what about the decision version of that problem? Do we know that decision versions for sure to be tractable, intractable, or can we really say? The second question I have for you is the same thing, only with intractability, so if we know the optimization version of a problem to be intractable, then what about the decision version, is that for sure, going to be tractable, intractable, or is this a case where we can't really make a clear statement having only this information here that the optimization version is intractable, and finally questions number 3 and 4 are exactly the other way around than questions 1 and 2, so here we have information about the decision version of a problem, so question number 3, if the decision version of a problem is tractable, then what about the optimization version? And question number 4, if the decision version of a problem is intractable, then what about the optimization version? So please give your answers here with one of each of the three possible choices here.

The correct answer is, if we know the optimization version to be tractable, then the decision version is also tractable. If we know it to be intractable, then the decision version is also intractable. The same goes if we have the information about the decision version. You see, the tractability of an optimization version and of a decision version are closely related. Now, two of these answers I think are more obvious than two others ones. I think the answer to question #1 is rather obvious because if the optimization version is tractable so you can find the best possible solution in polynomial time, then with that solution you can also very easily answer a yes or no question about the solution. The same thing is true I think for answer #4 because if the decision version is intractable then the optimization version also has to be intractable or if you could easily solve the optimization version then it would also be easy to answer a yes or no question about the solution. The answers to question #2 and #3 I think are a little bit less obvious. If you know the optimization version to be intractable, then of course it could be the case that just having to say yes or no given the size of the solution that you're looking for could be easier, that could be tractable. So you would have to show in some way that if the decision version were tractable, then the optimization version would also be easy to solve, which is basically what I've asked you to think about in question #3 here. One way to show this is and this is not very precise but just to give you an idea here that if you have a tractable decision version then you could ask multiple questions about the solution and still remain in polynomial time. So for example if you're looking for the best possible solution for vertex cover, you could first ask the decision version say for n/2 as the value of k you're looking for and see if you can find a solution of this size. If that is the case, you can decrease k or if you cannot find a solution that is that small then you have to increase k and by solving the decision version often enough so searching for that optimum k you can also solve the optimization version. So this is not that obvious to see but for now you can keep in mind that optimization versions of a problem and decision versions of a problem behave more or less the same with respect to tractability or intractability.

Now that we talked about the detail of optimization problems versus decision problems, we are ready to dive in. So how can we show that a problem is intractable? We already know how to show that a problem is tractable. We just have to find a polynomial time algorithm for it. But how do we find out if the problem is intractable? And there's actually two different ways we could go about this. The first way we could try to show intractability is by gathering evidence. What I mean by gathering evidence is this, we've already found out that Alice, Bob, and Carol were working on problems for which they weren't able to find a polynomial time algorithm, but all of their problems were connected, so if one of them managed to find a solution then all of their problems will be tractable, so one way we could be gathering evidence is if we just find more people working on more hard problems, and of course all of them are very smart people. If we could find more and more people who are also very smart working on very tough problems, but also no one of them finds a polynomial time algorithm. We could basically be saying, look, there are so many people who have looked at this problem, none of them has found out that the problem is tractable, so we think it must be intractable. Another way to go about this would be a mathematical proof. And what a mathematical proof would have to show is that no matter what kind of intricate, ingenious, sophisticated, complex algorithm we use for a problem, that algorithm will still need exponential time to solve the problem. So, what I would like you to think about for our next quiz is the advantages and disadvantages of each of these approaches here. And I would like you to think about two things. First of all, which of these approaches is probably easier to accomplish? And secondly, which of these two approaches here would you find more convincing to show that a problem is intractable? So please give me your two answers in these bubbles.

The answer here, although it might be a little bit subjective, but I think it's rather objective actually is that in terms of difficulty, it's much easier to just gather evidence than accomplish a mathematical proof in my opinion. The reason for that is that in a mathematical proof you cannot only say well these 10 people or these 20 people haven't found a good algorithm for my problem, but you would really have to show that any algorithm that anybody could ever come up with would still require exponential time which does not allow you to give any examples or just state the number of algorithms, but you have to go about it in another way that is much more complicated as you'll also see in this unit. In terms of convincing this of course, the mathematical proof is more convincing than just gathering evidence because here we know that no matter what happens the problem is intractable whereas here we could just have another smart person coming in who finds a polynomial time algorithm and then it suddenly turns out that the problem is tractable.

Now, before we continue, I would like to make one important point, and I don't know if this already occurred to you, but at least you might be inclined to think that it's rather obvious that the problem that Alice, Bob, and Carol are working on is intractable for one simple reason. Each of their problems has to look at an exponential number of solutions. For vertex cover, we have O(2^n) potential solutions that we have to look at because there are 2^n 0-1 assignments to the vertices for putting the vertices of a graph into a vertex cover. And the same thing is true for independent set and clique as we have seen in the last unit. Now, the question is does the fact that there is an exponential number of solutions already mean that the problem must be intractable. You could basically think this because having to consider an exponential number of solutions might mean that you have to look at each one of the solutions and that obviously is going to take exponential time. What I now want to show you is that this is not the correct intuition to have. Having many solutions does not mean that the problem is intractable. It can mean this, but it does not have to. To show you this, I'm going to give you one example where we have an exponential number of solutions but actually there is a polynomial time algorithm for our problem. What I would like you to tell me--in this graph here, how many different paths or how many different ways there are of getting from A to B if you cannot go backwards? You start out at A and once you've traveled across an edge you cannot go backwards. You always have to go forward, so how many ways are there to get from A to B? Of course this is a very simple warm up example. I'm going to draw a little arrows here just to make clear that we cannot go backwards.

The answer here is, of course, that there's just two different ways to get from A to B-- either you go this way up here or you go this way down here.

Now how about this little bit more complicated graph? Again, I would like you to figure out the number of paths or the number of different ways that you can take to get from A to B and the rules here are again the same-- you can only travel in this forward direction. I'm again going to draw the little arrows. So please enter your answer here into this box.

The answer here is that there is actually four different ways of getting from A to B. This up here is the first one, then you could go this way, then you could go this way, or you can go this way, which is four in total.

Now finally, I would like you to figure two more--this one down here and this one down here, and the rules again are the same. I'm not going to draw in the little errors just so that you can still see the image, but you still have to move in this direction here from A to B and the same down here. How many ways are their from A to B in this graph up here and how many ways are there to get from A to B in this graph down here.

The answer is that up here there's 8 different possibilities and down here, there's 16 possibilities. Now, there's two ways to go about this. One is to just do the counting, which can get very tedious at least in the graph down here, or the other one is to figure out a general rule which could be quite helpful and that rule is that any time that you want to move forward you have two different choices that you can make. Once you start in A, there's two different choices. You can go either that way or that way. If your say in this vertex here, you have two choices. You can either stay up here or you can go down here. Likewise when you're down here, you can go either this way or that way. Each time you move forward, you can make one of two choices and since here, you can make one of two choices except for the last time. In the last move, you'll always have to end up at B, so here we have 1, 2, 3 times that we can make two choices and then we cannot make a choice here anymore. And 2³ is 8 and down here, it's basically the same thing. We just have one more of these layers here where we can make a choice and so that's going to be 2? or 16.

Now, let's try to find a general rule for the number of ways that you can get from A to B. The number of vertices in each of the graphs that I have drawn is 4, 6, 8 and 10. When we have four vertices, there are two different ways to get from A to B. When we have six, there are four different ways. When we have eight, there are eight different ways. And when we have 10 vertices, there are 16 different ways to get from A to B. Now my question to you is what if we had 20 vertices and please again give your answer down here in this box?

The answer here is that there are 512 possibilities to get from A to B, and the way to figure this out is to notice that there's a general rule here. Each time that we add two vertices, the number of different ways that you can use to get from A to B doubles. We add 2 here, the number of ways to get from A to B multiplies by 2. We add 2 here and the number multiplies by 2 and so on and so on. To get from 10 to 20, we have to add two vertices 5 times, so we already know that there are 16 ways to get from A to B if we have 10 vertices. For 20 vertices, we add 2 five times, which means we multiply by 2 five times and that turns out to be 512.

If you play around with this a little bit further, you can figure out a general rule here, and the general rule is that if you have n vertices then the number of different paths from A to B is two to the power of n half minus one. And the way you can see that this formula is correct is you can plug in the four here so it's four over two which is two minus one and 2¹ that is two. And in the quiz, we just saw that each time we add two vertices here, we have to multiply by 2, and this is exactly what happens here because if we add two vertices here then the exponent here will increase by one. And in this way, you can work your way up to larger and larger networks of this structure. So what you can see here is that the number of ways to get from A to B grows exponentially as these graphs here get more and more vertices. And I'm now going to ask in algorithmic question, and the algorithmic question is a very simple one and that is what is the fastest way to get from A to B or the shortest way to get from A to B. And if you have had an algorithm's class, you might know that this is a problem that can be solved in polynomial time. So there are many algorithms actually that can figure out in polynomial time. What the shortest way is to get from A to B? But this means that having many possible solutions cannot mean intractability because here's the case where we do have many possible solutions. There are exponentially many ways to get from A to B, but if we ask an algorithm to figure out the shortest one for us, it does obviously not have to consider an exponential number of solutions to come up with the answer because, otherwise, there would be no polynomial time algorithm to figure out shortest paths between two points A and B. And so for vertex cover, clique, and independent set, this means that if we want to show these problems to be intractable, it's not enough to point out that there's an exponential number of solutions because that does not mean that there can't be no polynomial time algorithm.