cs313 »

cs313 unit 1.4

01 q Analyzing Alices Algorithm

Now that we have learned how to analyze algorithms let's look back to the algorithm that Alice was using to solve her problem which was the whole point of us learning about algorithm analysis in the first place. Here again is the algorithm that Alice was using. It's those five lines here. And although we already notice that it's a running time that grows exponentially we don't really yet have the precise running time stated in O notation. In order to do this, let's look at the individual lines and see how often they are executed. And then as a little quiz, I will ask you to figure out the running time of Alice algorithm using O notation. Let's start with line 1, and line 1 is just an assignment of a value. That's actually it takes zero times steps but we can also just say that it takes O(1), which just means that it's some constant although in this case the constant is zero but O(1) is a general notation to keep in mind. Anytime you want to state that an algorithm runs in constant time you can just write O(1). Now, line #2. When we have n communication centers. So N is going to be the size of the input in this case. Then we already figured out at the beginning that this line here is going to executed O(2^n) and I'm going to use this notation here, which you might not be familiar with to say that n here is the exponent. So what about line 3? So line 3 is going to be called each time that this loop here executes. So each time the loop of lines 2, 3, 4, and 5 executes we're going to call this function here. Now the question is how long does it take as to check if an assignment of 0 and 1 values is valid. So if you have a network that has n communication centers then in theory each communication center can be connected to any other communication center. So you have n communication centers and the question is how many connections can they have between each other? So this one could be connected to this one, this one here, this one here, and so on. And basically checking if an assignment of 0 and 1 values is valid means checking for each cable between two communication centers whether at least one of the communications centers is assigned a 1. The running time of checking if the whole assignment of 0 and 1 values is valid depends on the total number of cables. The question here is what is the maximum number of cables that we can have. And that is actually very similar to the running time of the algorithm that you analyzed in the quiz a few minutes ago because the first communication center can be connected to n-1 other communication centers and the second one is it can also be connected to n-1 other centers but of course we don't want to double count or actually it doesn't really matter if we double count or not because we're using O notation. But if we don't double count, then it's just n-2 cables here and this goes on and on and on. And from the algorithm that you analyzed before when we were discussing O notation, you already know that if you do the sum of n-1, n-2, n-3, and so on until you get down to 1 then that is O(n²). The worst case running time for checking if a given assignment is valid means going through O(n²) different cables and then for each of those cables checking if one of the communication centers that it is attached to has been assigned a 1. Now to figure out the running time of this line here, of course we first know it's O(n²) cables and then the question is how much time do we need to check an individual cable but for now I think without further discussion we can assume that it's constant time because it's just connected to two communication centers. There's probably an efficient way that we can implement this. If you were to prove the actual running time of course you would have to have a more detailed discussion here. But for now we'll just say each time this line here is executed it takes O(2^n) time. Now, what about line 4? Each time this line is executed, we have to do the sum of all the 1s in the assignment. And now the assignment concerns the communication centers. So we have to go through n communication centers and count how many 1s we find. So each time this line is executed, it will take O(n^2) time, which is linear time. And finally, line 5 is taking the minimum of the best possible solution we have found so far and the number of devices we have in the current solution to figure out if the current solution is a better solution than the one that we already have. But this of course takes only constant time because it does not depend on the size of the input. It always takes two values and produces a minimum. So as announced as our next quiz, I would like you to take this information and figure out what the overall running time of Alice's algorithm will be given a network of n communication centers. So which of these six possibilities is the correct running time of Alice's algorithm above?

01 s Analyzing Alices Algorithm

And the correct answer here is O(2nn²) and the reason why this is correct is as follows-- so the first line here takes constant amount of times so that's not going to be relevant. Now, this loop here lines 2 to 5 are executed 2n, so the question is each time this interloop here lines 3 to 5 is executed, how much time does that take. So checking of the assignment is valid, takes O(2n²), then if the assignment is valid, we need an additional O(n)+O(1), so an additional O(n) time because this here is constant, so we can ignore it. So the maximum time that lines 3 to 5 take is O(n²)+O(n)+O(1), so the largest growing term here is O(n²)--so the interloop takes O(n²) and that is executed 2n. So that's why the running time of Alice's algorithm is O(2nn²), and the good thing is that because we have our own notation, we could do this analysis without actually concretely stating how all of these algorithm is programmed. So as I hoped you see, it's quite a useful notation to have.

02 l Acceptable and Unacceptable Running Times

Alice is facing the issue that her simple algorithm while it's correct does not show an acceptable running time so we should probably say a little bit more as to what we would consider an acceptable running time, like here and what we would consider an unacceptable running time. And in theoretical computer science or at least for the first units of this course, we're just making a distinction between two cases basically. The first case here is polynomial running time. And polynomial running time means that the algorithm has a running time that can be stated as some polynomial of the size of the input. So, polynomial time would be an algorithm that has a running time of O(n), O(n²), but even with an algorithm with a running time such as O(n¹?) or worse would be considered a polynomial time algorithm and we would say that the running time of this algorithm is more less acceptable. Now, exponential running time, those are algorithms that have n in the exponent of their running time. So, O(2?) would be an algorithm that's exponential running time. O(1.1?) would be an algorithm with exponential running time and also something really bad such as O(10?) that would also be an exponential running time. What about an algorithm such as O(3?n²), that would also be called exponential running time because as you saw in previous quizzes, the 3? is the fastest growing term here. So, the dominant term here is an exponential term. So, running times like this would also fall under exponential running time.

03 q Polynomial or Exponential Running Time

Now, let's do a little quiz just to see that you can easily make the distinction between the polynomial and exponential running time. So I'm going to give you a couple of running times and I want you to tell me if this is a polynomial running time or an exponential running time. So we have O(2nlog(n)), O(2log(n)), O(1.00001n), n¹???, and then Alice's algorithm, which you'll recall had a running time of O(2nn²). So please for each of these functions is that the function for a polynomial running time or for an exponential running time.

03 s Polynomial or Exponential Running Time

So, here's the correct answers. The first one, 2^nlog(n). That is clearly an exponential running time because 2^n is exponential. Now, what about the next one, 2^log(n). Well, 2^log(n) is actually O(n) because... So, having the logarithm and exponent, well, actually cancel out the exponents. So, that is O(n). So, although it might look exponential, if you don't look closely enough, that is actually a polynomial running time. Now, O(1.00001^n), that is going to be a very, very slowly growing function, but nevertheless, it's exponential. So, this is the correct answer, and n^1000, that's basically the opposite. It's a polynomial that will grow rapidly fast and you would never want an algorithm actually with that running time, but nevertheless, it is a polynomial running time. Now, this is algorithm--well, you already figured that out I guess because it's 2^nn², that is an exponential running time algorithm, that's why it's really bad. Now, as you see with those two examples here, there's something to be said about saying that we consider polynomial running time to be acceptable and exponential running time to unacceptable, but in general, what you will find is that polynomial time algorithms tend to have a running time of n² or n³. So, there is not really any algorithm that I would know of, at least now that make sense that has a running time like this. So, it's kind of okay from empirical experience to call a polynomial time algorithm, actually one with an acceptable running time, and the same thing is with exponential running time. So, when you have an exponential time algorithm, you will not have one that has in the base of figure like 1.00001^n. For some algorithms, you might have 1.1 or 1.2, but even then, it's a fairly rapidly growing function. So, it's also okay to call exponential running time algorithms unacceptable in general.

04 l Bobs Problem

Now it's time for you to meet our second computer scientist Bob. Bob is working in biotechnology or bioinformatics and the problem that Bob is working on is he's doing gene analysis. He is looking at bunch of different genes such as this one here. Bob is basically trying to figure out how genes work together. Which groups of genes get activated, say when a cell develops a disease or when it's infected as oppose to the healthy cell. Bob will receive data from a laboratory that will tell him which genes work together or which get activated under a certain condition and which genes don't. For example these two genes here will get activated more or less under the same conditions. Also these two here, but these two here they appear to be kind of independent in laboratory research. Of course Bob does not only receive data on two genes but on a number of genes. And of course there are many more interactions so say group of genes that tends to work together quite a lot. And so since Bob is trying to figure out which groups of genes work together, what he is basically looking for is a collection of genes that are all connected to each other. I'll just give you one example. This here. These three genes are all connected to each other. This is kind of the type of group that he is looking for. What he would not be looking for is for example something like this. These four genes here. Most of them are connected to each other but not every single one of them is connected to every other. This gene here is not connected to that one. And the reason why he is looking for these clusters of genes is that when you find a group of genes where each gene displays a very similar behavior to every other gene, then you have a kind of functional cluster so a large set of genes that can be assumed to be involved in the same processes and you can then use this for drug targeting or figuring more out about the disease. This problem is a bit of a simplification of what you might encounter in the real biotechnology setting. For example you'd normally be content with these four genes here that are not all fully connected but looking for genes where every gene is connected to every other one is a bit more convenient to work with and actually it doesn't really make much of a difference from an algorithmic perspective so let's say it's realistic enough.

05 q Genetics Network

For our first quiz, what I would like you to figure out for this genetics network here is the largest group of genes that you can select so that every selected gene is connected to every other selected gene. What is the largest group of genes where every gene is connected to every other gene. You've already seen that we can find a group of three genes where every gene is connected to every other gene, but may be you can find a larger one. The way I want you to answer this quiz is just to check off the genes that belong to this group. Which genes do we have to select so that we have the largest possible group of genes where every gene is connected to every other gene.

05 s Genetics Network

The answer to that is we can find a group of four genes that are connected to each other by selecting this one here, this one here, this one here, and this one here and as you will see, this gene here is connected to those three. This gene here is connected to that one, that one, and that one. This one up here is also connected to this one, this one, and this one and similarly here the fourth one is also connected to the other three. That is actually the largest possible group of genes that you can find that are all connected to each other in this network. Now it should be noted that this problem is a bit of a simplification of what you might encounter in a real biotechnology setting. For example, normally in biotechnology you would be happy if you found a group of genes where most are connected to most others but here we'll be looking only for groups that are fully connected to each other so like this one here. The reason why we're doing that is that from an algorithmic perceptive, it doesn't really make much difference and it's just simply to work with the problem where we know we have to find groups of genes that are fully connected to each other rather than if we allow certain connections to be missing.

06 q Bobs Algorithm

Similar to Alice, Bob has been asked to design an algorithm that would find these large group of genes because as I'm sure you already have noticed in this quiz, it's rather difficult to find these large groups. You start out by finding small groups but once you have tried to extend them, often you'll find that one or two connections are missing and then you have to start all over again. This is a typical problem that you would like to use a computer to solve for you. How can Bob approach this problem. Well, actually, he could write an algorithm that is very similar to the one that Alice has used. We start out by setting the size of the largest group to zero because we haven't found any group yet, and then again we can try all assignments of the value 0 and 1 to the genes. A 1 meaning that we're considering this gene to be part of this large cluster of genes that are all connected to each other and an assignment of 0 and 1 to the genes is valid if all genes that have been assigned 1 are connected to each other. If that is the case, then the size of that group is again the sum of the number of genes where we have assigned a 1 and the largest group is the maximum of the largest group we have found so far and the size of the new group that we have just found. The main difference to Alice's algorithm are two things-- One is, it's a bit different which type of assignment we consider as valid and the other one is that now we're not looking for a group of genes or communication centers that is as small as possible, but now we are looking for a group of genes that is as large as possible. Since the algorithm is so similar to Alice's, I think it should be rather easy for you to figure out our next quiz and of course, that quiz is going to be to determine the running time of Bob's algorithm and of course, I would like to state that in big O-notation again. I'm going to reveal to you that the running time is going to be O of some number to the power of n times n to the power of some other number or the same number where an n is the number of genes in case you were wondering.

06 s Bobs Algorithm

And the correct answer here is that the running time of Bob's algorithm is 2^n n², which is just the same running time as the algorithm that Alice was using, and the reasons are also very similar. If you have n genes, then again there's 2^n assignments of 0 and 1 to the genes, and checking if an assignment is valid, so checking if all genes that have a 1 are connected will also take O(n²). That, of course, depends a bit on how many 1's there are in the assignment, but a worst-case analysis would be here that close to n genes have been assigned 1, so we need to check all of their connections which similar to Alice's algorithm, you already know there are about n² many of them. Bob, similar to Alice, has so far only found an algorithm that is totally impractical for the problem that he's trying to solve because it has exponential running time.

07 q Tractable and Intractable Problems

What both Alice and Bob have so far been able to find is--they have found exponential time algorithms for their problem and they haven't been able to come up with anything better so far. The question of course is does the fact that Alice and Bob have not yet been able to come up with a polynomial time algorithm mean that the problems they are working on are only solvable in exponential time or could there be some sort of polynomial time algorithm that they haven't found yet and there's a special terminology for that. Problems or computer problems where you only have exponential time algorithms are called intractable problems, and those compute problems where you can't find a polynomial time algorithm--those are called tractable problems. And this is going to be a rather obvious question to some of you I hope-- does the fact that so far we only know exponential time algorithms for the problem that Alice was working on and the problem that Bob was working on. Does that mean that their problems are intractable or in other words, can we already say whether Alice's or Bob's problems are intractable or tractable, and I would like you to tell me if you think that is the case, so if we can already say that their problems are intractable or if there's probably more information that we need.

07 s Tractable and Intractable Problems

The answer to that is we are not yet able to say if the problems that Alice and Bob are working on are tractable or intractable, and the reason for that is that we have only considered one single algorithm, but for a problem to be truly intractable, we would have to show that no matter what kind of algorithm we come up with, so if we don't use the simple algorithm like Alice and Bob were using but something more complicated, we would have to show that nevertheless, this algorithm still runs an exponential running time. Just by stating that there' s one algorithm for a problem that has exponential running time does not necessarily mean that the problem itself is hard; it could also mean that we just haven't felt long enough about designing an appropriate algorithm for that problem.

08 q Showing Tractability and Intractability

The question I would now like you to think about for a moment and this is going to be our next quiz is if you have a computer problem and you want to know if this problem is tractable or intractable, then I would like you to think about which one is actually harder to show or if it's both equally hard to show. If you think that showing the problem to be tractable is easier than showing it to be intractable, then I want you to check this circle here. If you think that showing intractability, so showing that a problem belongs here is easier than showing that it belongs here, and I want you to check this circle here. And finally, if you think it hardly makes any difference, then I want you to check this circle here.

08 s Showing Tractability and Intractability

The correct answer and again, this might be a little bit subjective, but I think in this case it's not very subjective. Is showing tractability is actually easier than showing intractability, and the reason for that is that if you want to show that a problem has a polynomial time algorithm, then all you need to do is come up with that algorithm. Now, that might be very hard, but nevertheless, once you have found a single algorithm for a problem that has polynomial running time, then you're done. You have shown that this problem is tractable. On the other hand, if you want to show that a problem is intractable--what you have to show is any possible algorithm, so not only the algorithms that you can come up with, not only the algorithms that somebody else can come up--any algorithm that somebody could come up with at any point in time must still have exponential running time. Just stating a single algorithm or even a hundred algorithms is not enough to show intractability. For that reason, I think show intractability is in a way much easier than showing intractability.

09 l Tractable vs Intractable

Now, we already talked about the notion of polynomial running time being associated with tractability and exponential running time being associated with intractability can lead to some borderline cases where you would probably argue. For example, if you had an algorithm that had polynomial running time of O(n¹?) and you had an exponential running time algorithm with running time 1.01^n, then actually it would be very hard to decide which one of these algorithms to take. It would kind of depend on additional information. What other constant factors here and what is the size of the n. As I mentioned before, this hardly occurs in practice and this hardly occurs in practice as well. Tractable problems are usually really tractable and intractable problems tend to be intractable in general, but in later units, we'll actually see how you can deal with exponential running times. The question we're now dealing with is both for Alice's problem and Bob's problem. Are they intractable or can we find a polynomial time algorithm that would show us that these problems are tractable? So far Alice and Bob really don't have a good idea on how to tackle that problem. That's why they actually decides to get together and discuss with each other the problems that they are trying to solve, and to that meeting, they also invites a friend of theirs, and that friend of theirs is Carol because as they have learned Carol is also working on a problem that so far she has only found an exponential time algorithm for.

10 q Carols Problem

Now it's time for you to meet Carol, and as I mentioned at the beginning of this course, Carol is working in finance and her job is to design algorithms that will optimize financial investments. Given recent market turmoils, her employer has asked her to come up with an algorithm that will help them design a secure portfolio of investments. The problem that Carol is looking at is actually quite similar to the one that Bob is looking at. She is looking at a number of investment opportunities for her company, for example, she could invest in precious metals such as gold, silver, or maybe even copper. She could also invest in stocks. So, for example, a copper mining company or in a silver and copper mining company or she could invest in an industrial manufacturer or also in some other bank. In order for the portfolio to be secured, she wants to spread her risk as much as possible. She does not want to invest in things that are too similar or too closely connected. For example, she wants to invest either in gold or in silver, but this line here basically tells us that she does not want to invest in both at the same time, and so she basically only wants to invest in one of those precious metal. If she invest in copper, she probably does not want to invest in the mining company because none of the copper price goes down so will the stock up the copper mining company. Similar here, this is a silver and copper mining, so either invest in this company or in those precious metals here. If the industrial manufacturer is actually a supplier to this company here, you probably wouldn't want to invest in both of them at the same and let's say that the bank down here is backing the two mining company, so you either invest in the bank or maybe also these two mining company separately but you definitely would not want to--well, actually lets just say you also didn't want to invest in two mining companies at the same time. In order to spread the risk as much as possible, Carol, of course, wants to find a portfolio that contains as many different items as possible because then she has spread out her risk, but at the same time, she must observe those rule. She cannot just select all of these possibilities but only possibilities that are not connected to each other. The way that is very similar to Bob's problem because Bob, when he was looking at the genes, he was looking for genes where every gene is connected to every other gene, and now Carol is looking for possible investments that are not connected at all to each other. What is the largest portfolio or how many items can Carol select at maximum so none of them are connected? I would like you to figure that out as our next quiz and I would like you to put your answer here in this box please.

10 s Carols Problem

And there again several possible solutions here, but the maximum number of items that Carol can select is at most 3 and I'll give you one example of this. For example, she could decide to invest in the manufacturer, which means that she cannot invest in this mining company here. Then she could maybe decide to also invest in this bank, which means she cannot invest in this mining company here, and then she could also choose one of those methods here. Let's say she also decides that she is going to invest in silver, but that means that copper and gold are out of the question. Again, she could have also chosen one of the other possibilities here, but three is really the maximum number of items that she can select. Similar to the problem that Bob was working on, this is of course a bit of a simplification, so when you're working on real work finance problems, you usually have constraints that are a bit more complicate and also of course, your networks will be much larger, but in essence that's a sort of problem that you might encounter in financial modeling and optimization--only a simplified version of it.

11 l Bob and Carol Meet

Now, Bob, Carol, and Alice who are going to come in later are finally having their meeting. Well, I should say something about Carol's algorithm, but Carol already has a bit more experience with algorithms than Bob or Alice. She has already figured out that a simple algorithm that you just try all the possibilities of selecting certain investment opportunities into her portfolio is just not going to cut it. Carol is really looking forward to meeting Bob and figuring out some new ideas. At the beginning of the meeting, Alice is not there yet. Bob and Carol started explaining their problems to each other. Bob is explaining to Carol how among a set of genes some of which are connected. He is looking for groups where every gene is connected to every other gene and Carol, of course, explains her problem where she is looking for potential investments that are not connected to each other. such as this one here. Now, in order to not have to talk about genes and investments anymore, we should use some common terminology for both of those problems and if you already have an algorithm's course, you might have come across a structure known as graphs. and graphs are basically just objects that are connected to each other. The network up here at would be called a graph and the network down here would also be called a graph. This here, the object would be called a vertex and of course, this, this, and all of these objects are vertices so this would also be a vertex, here, either the purple or the green ones and then the connections are called edges. All of the blue connections or some of those that I've colored green here every single one of them is an edge. An edge always connects to vertices and the whole structure is called a graph. And if you used that terminology, you can state the problems that Bob and Carol are working on using a similar language. Bob's problem is basically taking as input a graph with n vertices and the output that he is looking for is the largest set of vertices all connected to each other. What Carol is looking for, again the input is a graph with n vertices and the output is the largest set of vertices not all connected to each other that non-connected to each other. As you can see, once we use the common terminology, the problem that Bob is working on is actually very, very similar to the one that Carol is working on. The only difference is Bob is looking for a set of vertices where all are connected to each other and Carol is working on a problem where none of the vertices are connected to each other but everything else is the same. You are given a graph with n vertices, you are looking for the largest possible set, and all it's about is the connection that these vertices have to each other. In the future, to be better able to talk about these problems, we'll also give them names as we do with most problems Also, so that we don't always have to say this is Bob's problem and this is Carol's problem. What we will call Bob's problem is a problem named Clique because all of those vertices are very closely connected to each other just like in a clique of friends for example And we'll call Carol's problem independent set. Carol is basically looking for the opposite of a closely collected clique. Carol is looking for vertices that are not connected to each other. If those were friends, they wouldn't know of each other. Now, after Bob and Carol have explained their problems to each other and given them the name, they noticed since the only difference between clique and independent set is that in one we're looking for all vertices to be connected and on the other one we are looking for none of the vertices to be connected That problems are actually very, very closely related, and I will show you the idea that they come up with to figure out how similar these problems are. also from an algorithmic perspective.

12 q Clique

Let's try a graph here on this left side and then I'll have you figure out the largest possible clique that is in that network. It's going to be an easy example. Here you can check off these boxes and just make a check mark in each of the boxes that belong to the largest clique. The largest set of vertices that are all connected to each other.

12 s Clique

And it's easy to see that set is exactly 4 vertices, so it's this one here, this one here, this one and this one. We have 4 vertices that are all connected to each other.

13 q Independent Set

Now I'm going to draw a very similar--well, in a way very similar network for independent set. At least, I'm going to place the vertices at the same locations and we'll just how similar they are. And now, we're going to use similar quiz and this time, I would like you to check all vertices not that belong to the largest clique but to the largest independent set. Then you can make your check marks on each of the vertices that you think belong to the largest independent set.

13 s Independent Set

The solution here is that you can also find 4 vertices that form an independent set. This one here is rather obvious. You can also take this one in there, this one here. It's not connected to any of the other ones and this one here. Well, I've already placed the vertices in very similar locations, but as you can see, each vertex here has a corresponding vertex in the independent set, and of course, the way I've drawn this is not really by accident. There's a certain type of relationship between this network here and this network here.

14 l Good News for Bob and Carol

And to show you this relationship, I will do the following--for each of the edges that is blue over here I will draw in here and I will color them black and I'm going to do the same thing over here, so each edge that is blue over here I'm going to draw in black over here. Now, and I hope you can see it here even with my drawing that is not very similar, it turns out that this graph over here once I've added all the edges from this graph is the same as this graph here once I've added all the edges from that graph over here and this is exactly the connection between clique and independent set. If you have solved clique on a given graph, you take another graph where you draw exactly the opposite edges. Every pair of vertices that is connected by an edge in this network is not connected over here and vice versa but this means that finding the largest possible clique in one graph is basically the same as finding the largest possible independent set in the-- well you can call it an inversed graph where you connected exactly those vertices that where not connected over here and it's the same the other way around. If you have found an independent set in a graph that is as large as possible, then the same vertices will form a clique if you build the inverse graph. You connect all of the vertices that were not connected over here. This is actually a great news for Bob and for Carol because if Bob were to find a polynomial time algorithm for clique. If he were to determine that clique is tractable, then independent set would also be tractable because Carol could just take her network, build the inverse network, then have Bob solve clique on that network, take that same solution for her problem. Carol wanted to take independent set which she would basically do as she would start out with a graph then she would build the inverse graph and solve clique. That would give her the same solution as if she were looking for an independent set on the original graph. And it's the same for Bob here because if Carol finds a good algorithm for independent set, what he can do is he can take his graph, also build the inverse graph, and then solve independent set which means that either both clique and independent set are tractable or both of these problems are intractable but it cannot be the case that only one of them is tractable or intractable.

15 l Enter Alice

Having figured this out, of course, Bob and Carol are both very, very happy-- not because they've actually solve their problem but now they know at least they can collaborate. They've basically double their chances, so if any one of them finds a bit algorithm for their problem, they will know they have a good polynomial time algorithm for both of their problems. Alice also arrives to the meeting, and of course, Alice is very excited to hear about their discovery that Bob and Carol have made. Alice, of course, is hoping that maybe she can also find a connection of her problem to Bob and Carol's problem or at least that they can help her find a good algorithm for her problem by brainstorming. Alice explains her problem using the graph terminology as well, so Alice's problem as you will remember is to find a set of vertices so that all of the edges are covered in a network here. For example if you take these three, then each edge has at least one endpoint in those set of green vertices here. The input for Alice's problem is again a graph with n vertices and the output is the smallest set of vertices such that all edges are covered, and of course, you know what I mean by covered. By covered I mean that each edge has at least one n point in the set of vertices. Then we should give that problem a name to refer to it such as Bob was working on clique and Carol was working on independent set. We have Alice working on her problem, and since she is using the vertices to cover the edges, we'll call her problem vertex cover.

16 q Vertex Cover

To see if the problem that Alice is working on where it takes cover has any relation to either clique or independent set, the first thing that we could do is just take the graphs where we previously solve clique and independent set on and find vertex covers for that. This is kind of a double quiz because I would like you to figure out the size of the smallest possible vertex cover for both this graph here and this graph over here. Please enter your answer for this graph over here in this box and your answer for this graph here in this box.

16 s Vertex Cover

And the solution that we hear is that you can do a vertex cover with just three vertices, so here's one example. You could select this one which would cover all of those edges here. Now we could select this one, covering all those edges. And, finally, we could select the one down here. So all edges are covered selecting just three vertices, and there are alternative solutions but none that contain less than three vertices. Now what about over here--here we can find a very small vertex cover using just two vertices. This vertex here, we just don't even need to care about because it has no edges that it's connected to. We can select this one here--it covers all of those edges. And we can select, for example, this one here. Or not only for example because this is actually the only possible smallest solution. So now we did this exercise to see if vertex cover could have any relation to clique or independent set. So let's have a look back at which vertices were contained in the largest possible clique in this network, and those were four vertices--it was this one here, this one here, this one, and this one. Not any really apparent relation between the two problems. This one over here, however, is more interesting because in the largest possible independent set, we had those four here. So it could seem like vertex cover--those two green vertices here-- is exactly the opposite of independent set, so if you have found the largest possible independent set, then you have found the smallest possible vertex cover. And, indeed, it's actually not that difficult to see that this is always the case.

17 l Reductions

If you consider an independent set versus a vertex cover, what you have is the following. If you have the smallest possible vertex cover, that means you have selected a minimum number of vertices so that each edge is next to at least one of those vertices. If you were to remove those vertices from a graph, then no edges would remain because every edge is connected to at least one of those vertices of the vertex cover, so what remains is always an independent set. And since you've selected the smallest possible vertex cover, so the smallest number of vertices you need to remove, what remains must be the largest set or a largest set of independent vertices. And now that discovery is, of course, really great news for Alice, but it's also even better news for Carol whose smile gets a bit bigger, and, of course, also Bob because what they have now found out is this: clique and independent set are really closely connected. So if Bob discovers a good--or if any of them discovers a good algorithm for clique, then they have not only solved "clique" but also "independent set." So if just one of those two problems is solvable in polynomial time or, in other words, if just one of those two problems turns out to be tractable, then the other problem will be tractable as well. And now we have also connected vertex cover to independent set because we've basically figured out that finding the largest possible independent set for a graph is almost the same as looking for the smallest possible vertex cover. This also means that if you can find the smallest possible vertex cover, then you have found the largest possible independent set which we already know, through transforming the graph, lets you find the largest possible clique. So, actually, we can also draw this connection here because we already know about these two connections. What they have discovered is something that is commonly known as a reduction. And we'll get more deeply into that in the next unit. What a reduction is is basically a transformation between two problems so that if you find out that one of them is tractable, then the other one is tractable as well. So now the big question is, for clique, vertex cover, and independent set, are those three problems tractable--in which case Alice, Bob, and Carol would really keep smiling-- or will it turn out that all of these problems are intractable--in which case Bob, Alice, and Carol would tend to be rather unhappy? So which one is it going to be? Are all three going to end up very happy, are all three going to end up very sad, or is there maybe something in between? You can find out how that story continues in our next unit of our Introduction to Theoretical Computer Science. Plus, in the next unit, I will also tell you how you could quickly win a million dollars.

18 l Congratulations

Congratulation for completing Unit 1 of our introduction to theoretical computer science. We have introduced a lot of essential concepts and tools and you have managed to master them and now have them at your disposal. You know what a RAM. You know about O notation or big O notation. You know how to analyze algorithms and you've also learned to understand three very, very important problems in computer science known as clique, independent set and vertex cover. And while discussing these problems, you've also learned about the important concepts of polynomial time algorithms, exponential time algorithms, and the connected concepts of tractable problems and intractable problems. Having all of these concepts and tools at your disposal, you're ready to dive deeper into figuring out if we can make Alice, Bob and Carol happy by finding out that their problems are tractable or if we have to bring them bad news and show that their problems are intractable, but for now again congratulations.