# cs313 unit 6.1

## 01 l Introduction

As we've seen relaxing the requirement to find the best possible solutions can allow us to find polynomial time algorithms for NP complete problems. At least some times, so approximability is something that varies very much from problem to problem. For some problems, we know that we cannot approximate them at all, unless P=NP. Other problems such as vertex cover at least have a constant factor approximation algorithm. And then there are those very, very well-behaved problems such as knapsack for which we know polynomial time approximation schemes. But what if still all of this doesn't help? Well, in that case, we will literally have to start poking around. We'll be talking about randomization.

## 02 q Poking Around

So we have talked about using search trees, pre-processing, and approximation to solve NP complete problems. But there's one technique that we haven't yet talked about and that is using randomness to solve challenging problems. Now you know that so far in this course, I have always insisted on finding guarantees for the performance of algorithms. I wanted to have guarantees for the running time of the search trees. I even wanted to have guarantees for approximation. So am I ready to give that up now? Actually, I'm not, because even when we use randomness, we can demand certain guarantees from our algorithms. So, one way we could demand a guarantee from a random algorithm is, that we say it produces the correct or best possible solution with a certain probability. So best would be in the case of optimization problems and correct, of course, in the case of decision problems. And so, we'll also write this down for decision problems and finally we could also say that the algorithm has a running time that is random, and we want the running time to be polynomial with a certain probability and of course in this case here, we would be demanding that the solution is always the best possible solution or the correct solution. Well, actually we could also have combinations of these, but then the algorithm becomes rather weak in my mind and actually I'm not aware of any example where I have seen this before. Now, some of these approaches have special names and this one here is known as a Monte Carlo algorithm, and this one here is known as a Las Vegas approach. Don't ask me what that says about casinos in Europe versus casinos in America because I have no clue. Now the two approaches, Monte Carlo and Las Vegas are actually related to each other, and if you think about it for a little while, I think you can figure that out yourself. So my question for you is how are Monte Carlo algorithms and Las Vegas algorithms related to each other, and I'll give you three choices here. Is it that any Las Vegas algorithm can be turned into a Monte Carlo algorithm? So any algorithm over here can be turned into an algorithm like this. Is it the other way around, that any Monte Carlo algorithm can be turned into a Las Vegas algorithm? Or is it even that the two approaches are, roughly speaking, of course, more or less the same.

## 02 s Poking Around

And the correct answer is the first one--any Las Vegas algorithm can be turned into a Monte Carlo algorithm and the reason for that is the following-- In a Las Vegas algorithm, we said that the running time was polynomial with a certain probability and that means that we can let the algorithm run a few times, so sometimes it will take polynomial running time, sometime it would take exponential running time like this. So this is suppose to be the running time and let's say in the first time it's also exponential and we basically jut watch over the algorithm and as soon as it takes longer than polynomial time we stop it, we cut it off, and that would then give us exactly a Monte Carlo algorithm because here we said for this approach we demand that when we get the solution that solution is suppose to be the best possible one. So here the algorithm would also produce the best possible solution but we stopped it. We are stuck with some suboptimal solution or even no solution. Here the same thing as well, but then we started the next time and it just uses polynomial time and then we use that solution and this is exactly how a Monte Carlo algorithm would behave as well. It produces the best solution with a certain probability and in this case here this probability is dependent on the running time of the Las Vegas algorithm, which sometimes will be polynomial and sometimes will be exponential.

## 03 q Monte Carlo Algorithm

In this unit, we will be working with Monte Carlo approaches because we're looking at NP complete problems and actually as you will see there is not that many algorithms using randomness for NP complete problems anyway and one I'm going to show you in more detail is a Monte Carlo algorithm. Now the important thing about a Monte Carlo algorithm is this here, the probability. Now of course it would be great if the probability with which we get the best possible solution would be some constant because then we would have a very good algorithm as the next quiz will show you. So say we have a Monte Carlo algorithm with probability 1/2, how often do we need to run it until we are 99.9% certain that we have the best possible answer and please enter your answer here in this box. Now if you don't feel comfortable with probabilities, of course, you can also skip ahead, but even if you just have a basic knowledge of probabilities, you can probably get this one.

## 03 s Monte Carlo Algorithm

If we have a Monte Carlo algorithm that produces the best possible solution with probability 1/2 and we run it a couple of times, then what happens. If we run it the first time, we can be 50% sure that we already have the best possible answer. Then we run it again so the probability in the second run is again 1/2 that we do not get the best possible answer and in the third run, it's again 1/2 and so on. Now this is the probability of getting a suboptimal solution and of course, if we want to be should be smaller than 0.1% and then the problem becomes (1/2)^n and n is the number of times that we run this algorithm here should be smaller than 0.001, which is the same as 0.1% and then you can just solve for n and get the answer and we have to run this algorithm just 10 times which is pretty neat. Now of course for an NP complete problem, it's very unlikely that we get a probability like this because as you can see we could very, very, very quickly in polynomial time get a solution that is almost guarantee to be optimal. So what you will find for most Monte Carlo algorithms for challenging problems is that the probability here first of all becomes sometimes even exponentially small and second that it also depends on the input size but we'll soon see how that works out concretely.

## 04 q Guarantees

Now before we dive deeper into the algorithms, I want to make one important point. When you use randomness in computation, there is sometimes a danger of taking the poking around that I mentioned in my topic before this unit a bit too literally. So some people when they see an NP complete problem or an NP hard problem and NP complete of course being for the decision problem and NP hard for the optimization problem, just terminology means both problems are hard. And then some people would just say--oh this is NP complete and this is NP hard, well let's just try very, very many random solutions and pick the best one, which of course works better if you have an optimization problem than a decision problem, trying many random solutions sounds easy, actually it is easy, and you trust that the solution that it produces will be good enough. Well, sometimes you have no better choice than that, but in my mind I think you're playing a dangerous game and I want to illustrate this for you. So let's say that we play a game and the rules of the game are as follows-- you bet an amount x whatever you want and then you throw a die and of course, the value can be 1, 2, 3, 4, 5, or 6, and if you throw a number between 1 and 5, I will give you back x+30% of x, and if you throw a 6, I'll give you back 0. Now my first quiz for you and again this involves a bit probabilities should you play this game, and let's say you can play as often as you want--

## 04 s Guarantees

And the answer here is that--yes, I think you should play the game well, at least, if you're okay with playing games with money. So with the probability of 5/6, you're making 30% profit, and with the probability of 1/6, you're making 100% loss. So, your expected profit for each time you play is going to be 25% of this part over here minus about 16.7% for this part over here, which means each time you play, you make about 8.3% of profit on average as an expected value of course.

## 05 q Chance Flaws or Cheating

So, this game looks good, and you play it a couple of times, and now, the following scenario happens. Now, let's say you watch other people play the game first and you notice the following thing: In the last 40 throws, no six was thrown. My question to you is, of course, why could that be. Could that be due to pure chance? Could that be due to something being wrong with the die or maybe even someone is cheating? Please check all of those that are correct.

## 05 s Chance Flaws or Cheating

And the thing is all three are correct. So, 40 throws, no six, the chances of that are actually smaller than 0.1%, but that does not mean that it can't happen. So, it can happen by pure chance. Of course, you could also have the case that something is wrong with the die or somebody is cheating.

## 06 q Last Round

So you've been watching the game for awhile and then suddenly the person who is playing that game announces, "Last round!" So this is the last chance for you to play the game. And the odds already looked pretty good and also the last 40 times that you checked actually nobody lost. What would be the factors you would have to take into account to determine if and how much to bet in this last round? Now of course this is a rather subjective quiz, but I think we should agree more or less. So I will give you four opinions that you could have, and I also think that more than one of them are valid. So the first opinion could be certainly because the odds are fantastic. You get 30% back if you throw a 1, 2, 3, 4, or 5. So that will give you 30% profit. And in the last 40 throws, no 6 was thrown and the probability for that is smaller than 0.1%. Or maybe your opinion is, well, you can play a little. You know your chances of winning 30%. That's fine. But you should be ready to lose what you're betting here. The third option could be that it kind of depends on your risk affinity. So if you're ready to play a riskier game, you might want to bet a little bit more or stay out of the game altogether. And finally, one opinion could be well better not because it does seem a little bit suspicious.

## 06 s Last Round

And I think that the last three options could be valid responses. I think the first one is actually not. And the reason for that are two things: One is that I think it kind of does seem a bit suspicious but even if you don't think it's suspicious you don't really understand what's going on here. So on the last throws, it has all turned out very, very, very well. So no 6 was thrown. Always 30% payout. Now the reason could be pure chance and maybe the person offering the game at that point just had bad luck but the thing is you don't really know. So if you decide to play, I think you should only play a little and it certainly depends on your risk affinity if you even want to get into the game because I do think the whole thing seems a bit suspicious.

## 07 l Using Randomized Algorithms

Now what does this have to do with randomized algorithms? And of course any analogy can be criticized at one point or the other but I think it fits rather well. If you use a randomized algorithm without having any guarantees, well first of all, at a certain point in time, you have to determine that it's the last round. You're not running the algorithm any further. You've run it 200 times or 2000 times but then at a certain point you just have to say "Okay. This is it. This is going to be the result that I use." And it might be that the last 400 or 500 times or in this case 40 times the algorithm has not found a better solution. It has found one that seems very good and the solution has not improved so at a certain point in time you're just going to decide. Well, it's likely that this is a very good solution. Now, here's the thing where the analogy actually does not hold because if you think about it when we're dealing with an NP complete problem the chances of winning are not 5-6 over here. Indeed, we're playing a game with exponentially small probability of winning. It's like this, when you run a randomized algorithm without guarantees, you're playing a game with bad odds. You don't really understand what's going on and in the end even though you run the algorithm multiple times you only have one shot of getting it right at the point that you decide to stop your algorithm. Should you run a randomized algorithm without any guarantees, I think the rules are almost the same as the one for this game here. So if the stakes aren't very high so the probability of trying to solve this may be not that important and you're willing to take that risk then you can do it and sometimes it's the only possible way. But I think that an algorithm with guarantees is almost always preferable. First of all, you know what's going on and secondly and I think that's also very important to emphasize in order to get these guarantees here you have to construct an intelligent algorithm. So rather than just poking all the way around, you're poking intelligently. It's almost the same as with the search tree. So just using randomness in the way as brute force only that in this time it's random force. And I think the name already says how powerful that is but when you have guarantees, you're directing that random force in a certain direction and that of course tends to improve your results dramatically.

## 08 q Needle in a Haystack

Now in general and you might have come across one or more of these examples, using randomness is of course quite important in Computer Science. You might've come across randomness, for example, in crytography where randomness is in a way the only way to fully securely encode a message. You might've encountered randomness in something called symmetry breaking. Let's say you walk down a hallway and somebody comes in your direction, and you're trying to get out of the way of that person but the moment that you get out of the way, that person also tries to get out of the way in the same direction, and so if you did not have random decisions, then you would almost certainly crash into each other. And finally randomness is often used in simulations, so when you do physics simulation or financial scenarios, then of course, also randomness is something very, very useful. Now, the difficulty with using randomness to solve NP complete problems is this, in all of these examples here, being as random as possible is something rather desirable. So in cryptography, you want perfect randomness because that gives you the most security. Symmetry breaking works best if you have as much randomness as possible, so that these two entities, say, will make completely independent decisions of each other. And in simulation of course too, because the point of using randomness in simulation is to simulate random processes, so you want to model that as good as possible. Now for NP complete problems, this doesn't work because for NP complete problems, we don't want perfect randomness. What we are looking for is a needle in a haystack and finding a needle in a haystack simply by poking it from a few random directions won't really help. And that is why we need to direct our randomness. Let's say, you use randomness in a very naive way to solve clique and just assign all of the vertices, random values of 0 and 1. Now let's say you're doing this for a graph with n vertices. Well, actually I think you can answer this question if I give you four choices from which to choose. So is it 1/n²? Is it (1/2)^n? Is it 1/2 n? or is it 2^n/n²?

## 08 s Needle in a Haystack

And of course, the answer here is 1/2^n because for each vertex you have to make the right choice. If the vertex does not belong to the largest possible clique, then you have to assign it a 0 and if the vertex does belong to the largest clique, then you have to hope that the random algorithm you're using is assigning it a 1. And the probability of each correct assignment is one-half and you have to do this n times. So the probability of finding the largest possible clique using simple pure randomness, which of course is the same as 1/2^n which is exponentially small. We can't just hope to get the best solution by pure chance alone. We need a more intelligent or strategic approach for this.

## 09 q Strategy

And there are several ways to employ randomness in a more intelligent way but there is one that is actually quite common. Let's assume that this here is a huge list of all possible solutions. So, for NP complete problems or NP hard problems, this will be an exponentially large list, and the strategy of more intelligent random algorithms, and we're soon going to see a concrete example for that, is basically to randomly pick out certain solutions here in this list, and let's say that the list is ordered in some way, which I'm going to explain in more detail when we get to the concrete example, and then, the strategy is to explore solutions that lie around the ones that we randomly picked so like this. So, you would look at all of these solutions here, something like this. Now, of course, each time the algorithm runs here, it can only investigate a number of solutions that is polynomial in size, but of course the list of all potential solutions, so this whole thing here, has exponential size, and through the way in which you design this algorithm, well, first of all, you want to ensure that you land in good places so to say, and this is often, of course, not very trivial to prove how to do that, and the other thing is that you want to have a good and intelligent exploration of the area around your solution again something which is of not very easy to prove or show. Actually, we can't expect it to be any other way after all we're solving a very challenging namely NP hard or NP complete problem here. So, just to prepare you a little bit for the realities that you are about to face. Let's do another quiz. Let's say we have a randomized algorithm for an NP complete problem and that algorithm has a fixed error probability. Let's say the algorithm makes an error with a 90% chance and only gives us the correct answer with 10% chance. So, should we expect that a randomized algorithm for an NP complete problem that has a fixed error probability so to say with 90% chance gives us the wrong solution and with 10% gives us the correct one? Should we expect such an algorithm to run in polynomial time? And I'll give you three choices, and as always, more than one can be true. The first choice I'm offering you is no because if we would run that algorithm just a few times we would almost be guaranteed a correct solution even though this algorithm here that I am asking you about makes an error with a 90% chance for each time that we run it or if the answer no we should not expect that because the number of potential solutions is exponential, and the algorithm, since it runs in polynomial time, and this of course anthropomorphises a bit can only afford to check a small part of that exponential solution space if you will. And finally, one choice I'm offering you is yes, we should expect that because the laws of NP completeness that we have gotten to know in the past units do not account for randomness. So, just as these laws don't account for small basis or if we are content with an approximate solution, it could be that randomness makes such algorithms possible. So, please check all of these here that you think are correct.

## 09 s Strategy

And the correct answers here in my opinion are the first two. The first one is even if the algorithm makes an error with 90% probability, running it only a few times, it does guarantee a correct solution and that of course as you've seen before these probabilities here have to be multiplied with each other if we run the algorithm a couple of times. So, if you run it the first time, of course we have a 90% error chance, but if you run it the second time, we only have (90%)² error chance. If we run it three times, we have (90%)³. And once we have run this algorithm 50 times, we have an error probability of 90%^??, which is about 0.5%, and running it a couple of times more of course, we can even get this figure out much much lower. So, since we only have to run it a constant number of times to get the error very very low, it would mean that we're staying still in polynomial time, but we're almost guaranteed a perfect solution. Now, I'm not saying that this is impossible because the laws of NP completeness indeed do not account for randomness but of course it's very very unlikely It's almost the same as with the approximation algorithm where you wouldn't expect an approximation algorithm with an approximation factor of 1.01 even though for some problems it has not been proven that that is impossible but you still wouldn't expect it. Now, the second choice here I think is also correct. And that is basically my reasoning why I say you shouldn't expect this. The number of potential solutions for an NP-complete problem is exponential. Since the algorithm only runs in polynomial time, it can only afford to check a small part of that solution. That is basically the image that we just had where you only able to explore small areas of the whole solution space. And of course you have to employ the strategy. I just think that with this strategy it is highly unlikely to get an algorithm with this performance up here. It just doesn't make sense because you're still poking around a lot. So why would you get a fixed error probability in an exponentially large space if you can only afford to check a polynomial size part of it. And finally as I said the laws of NP completeness indeed do not account for randomness, but the thing is this just because that is so it doesn't mean we should expect anything to happen. Again, I think this is the same thing as with the approximation algorithm. Of course for some problems it would theoretically be possible to have a 1.01 approximation algorithm, but it just seems too unlikely to be true.

## 10 l Wishlist

So in the last quiz, you have seen that it's not very likely that we can find a randomized algorithm for an NP complete problem that, in polynomial time, will deliver us the right answer with a fixed probability. To be honest, we currently don't know if such an algorithm is possible or impossible but many believe that it is highly unlikely, and so far, no such algorithm has been discovered. It seems that we can sometimes, however, if we either crossed this one here of our list or crossed this one here of our list, and you'll soon see that- actually it doesn't really matter what we crossed of our list. But if we kind of reduced the number of wishes that we have, it seems that sometimes we can use randomness to at least improve the running time of exponential time algorithms. So just as you've seen when we optimized such this, it sometimes also possible to improve the running time of exponential time algorithms or the base of the exponent using randomness. Now, the question of course is could we use randomness to help Alice solve vertex cover or help Bob solve this clique problem or Carol her independent set problem or Davis' shortest tour problem at least when we say that we don't have this requirement here. And the answer unfortunately is we don't know. We don't know if there are good randomized algorithms to solve vertex cover or clique or independent set or shortest tour at least not if we are demanding guarantees. So, of course, there are randomized algorithms that will look through a random number of solutions, and these are also used in practice, but again, they don't offer any guarantees. There is one problem, however, that is NP complete where I can give you a randomized algorithm that at least is better than the best known deterministic algorithm and that problem is 3-SAT. As I hope you will remember, the 3-SAT problem had as an input, a Boolean formula with n variables. And the Boolean formula had a special property because it's 3-SAT. Each clause in the Boolean formula has exactly three variables. And the output that I'm looking for is again an answered to the question if this Boolean formula here has a satisfying assignment. So how can you use randomness to solve this problem? Well, there is a very simple and, in my mind, very elegant algorithm that in 1999 was proposed by Uwe Schöning to solve a 3-SAT instance with n variables and this here is the algorithm so the algorithm will start out by picking a random zero, one assignment for the the variables. Zero meaning false and one meaning true and then it will repeat 3n the following loop. If all clauses are already satisfied that it has found a satisfying assignment then it would actually stop and then it will randomly flip one of the variables of that clause, and by randomly flipping, I mean the following. So it will take this clause, which has three variables, and the clause is not satisfied. It will pick one of the variables, and if that variable is true, then it will set it to false, and if it's false, it will set it to true. Now, notice that this will automatically satisfy this clause, because the clause is not satisfied. So this variable here is a variable or the variable with a knot will evaluate to false. This part here will evaluate to false and this one here. So if I flipped any of those variables then the clause becomes satisfied. Now, if we run this algorithm here once, what are the chances that it will succeed and find a satisfying assignment provided that the initial Boolean formula does indeed have a satisfying assignment? If it doesn't have one, the algorithm can't find but, but let's assume it has one. Then the success of this algorithm depends on two things. One is how far of this initial solution here is from a satisfying assignment. And the second one is, of course, are these random choices that we make down here flipping variable successful. So, if we are somewhere near a satisfying assignment, and of course, by chance the algorithm makes the right choices here then we would find a satisfying assignment provided that the original Boolean formula actually has such an assignment. So let's look at these two success factors here a bit more closely.

## 11 q Success Factors

And of course, again, we'll do this as a quiz so, my first question for you regarding the first success-factor is this. In this line here, when we pick a random (0, 1) assignment for the variables how many variables can we expect to get right on average? By right, I mean, if in a satisfying assignment the variable is set to false, then this random assignment here also sets it to false and if it needs to be set to true this random assignment here also sets it to true. So, my three choices for you here are the following: Can we expect to nearly get all the variables correct in this random assignment here? Can we expect to get about half the variables correct in this random assignment here? Or is it not possible to make a statement about this here?

## 11 s Success Factors

And the answer here may be surprisingly is that using this random assignment here we will on average already get half of the variables right and that is of course because we have a 50% chance for each single variable of actually getting it right when we do it by coin flip because if in a satisfying assignment this variable needs to be set to false, there's a 50% chance that a random assignment will also set it to false and if it needs to be set to true, there is also 50% chance that we actually set it to true using this randomized assignment. Now, of course, sometimes we might get more variables correct and sometimes less but especially if N is large we can expect to get about half of the variables right in an average run. Now, the success-factor II, once we are half way there so to say what are the chances that using this process here will get us through the satisfying assignment again, if the original Boolean formula actually has one of those.

## 12 q Success Factor II

So let's assume that we have an unsatisfied clause then we would pick one of those clauses and randomly flip one of the variables So, what I would like to know from you is what can we say about the condition that we find ourselves in, in this case over here? Is it true that at least one of the variables of that clause must be flipped? Is it true that all of the variables of that clause must be flipped? Is it true that the chances of flipping the correct variable are at least one-third? Or is it correct that the chances of flipping the correct variable are at most one-third?

## 12 s Success Factor II

If you're a bit quiz savvy, probably this one was very easy to get, but I'll explain the answer to you of course as well. So, at least one variable must be flipped and the chances of flipping the right variable are at least one-third and I''ll explain it to you. Now, let's say we have one clause and that clause has 3 variables and it's not satisfied. So, if the overall Boolean formula has a satisfying assignment this means that in the satisfying assignment either this variable here, this one here, or this one here must evaluate to true and currently, they all evaluate to false. So, at least one of those variables must be flipped in order to satisfy the clause. What are the chances that we're flipping the correct variable? Meaning that we indeed choose the variable to flip that would also have a different value in the satisfying assignment. Well, since we select one out of three variables those chances are at least one-third. So, they are exactly one-third. If in a satisfying assignment, you have exactly one variable that evaluates to true or a variable set to false with a knot and the chances are even better if in the satisfying assignment you would have two of three variables that would need to evaluate to true, but there are always at least one-third. We initially can expect to get about half of the variables right and a random change will be successful or a random change and clause that is not satisfied will be successful with the probability of at least one-third and that of course also means that the probability of making a mistake meaning that we flip a variable that we should actually not have flipped is at most two-thirds.

## 13 q Distance to Satisfying Assignment

So another way that we can understand this is that we start out with a random assignment that has a distance of n/2 to the satisfying assignment Again, assuming that assignment exists. So, by distance, I mean the number of variables we need to flip. We can expect this to be about n/2 as we just found out. Each time this algorithm here goes through the loop, there's two things that can happen The first thing is that we move closer to the satisfying assignment and by closer, I mean that we gained one more variable that is set to the same as it would be in the satisfying assignment. So basically, making a step forward happens with probability one-third but of course as we also just found out, we can also make a step backward and kind of flip the wrong variable and the probability for doing that is at most two-thirds. Let's show worse case analysis here so we will erase this one here and say in a worse case, we make a step forward with probability exactly one-third and we make a step backward with probability exactly two-thirds. And how often do we make this step? Well, the loop here is executed 3n. So what happens is the following, we start off here with our random assignment and then 3n We either move forward with probability one-third or we move backward with probability two-thirds. And then the next time assume that we are lucky, we have move forward, we again move forward with probability one-third and backwards with probability two-thirds and this goes on and on and on and on until we either reach the satisfying assignment or which is much more likely, we don't reach the satisfying assignment. We might even get farther away from it. Actually, chances are pretty good that we are. Now, the nasty thing here of course is that we move forward with the lower probability than moving backward. So, the probability of coming from a random assignment to the satisfying assignment by running the algorithm once actually doesn't seem too good and the exact probability analysis here is somewhat complicated especially if you haven't had an intermediate course in statistics yet, but what we can try to do even if the statistic analysis is may be a bit complicated is at least do a simulation. So, run a program to do simulation for me. Assuming you start at a distance one half from a satisfying assignment What's the probability of making n/2 steps in the right direction if the probability of making a step in the right direction is one-third and the probability of making a step in the wrong direction is two-thirds and you try 3n and then run this for different values of n and here's what I got. Now, of course, when you run such a simulation yourself, you might get a little bit different results, but you will be more or less in the same range, I hope. Here are the results that I got. For n equals 10, you end up at 2.8% probability of making n/2 steps in the right direction. For n equals 20, it's 0.084% and you can see that it rather quickly decreases. Now, my question to you here, does the probability decrease as a function of n again logarithmically with n, linearly with n, or exponentially with n.

## 13 s Distance to Satisfying Assignment

And as you can see, each time we add 10 to n, so 10+10=20+10=30+10=40, the probability of reaching a success decreases by well more or less with a factor of 10 if not more. So this is clearly an exponential decrease. As we add a constant to n, the probability decreases by a certain factor and that factor is one-tenth or ever worse. So we definitely can see an exponential decrease of the success probability as a function of n. Now, the exact analysis of this algorithm is not within the scope of this course, although I would say it's understandable if have had a mid-level college course on probabilities and statistics.

## 14 q Expected Running Time

And the exact analysis of this algorithm here reveals that if the Boolean formula has a satisfying assignment, then this assignment is found where the probability of about So how many times do you have to run this algorithm given the success probability here to find the satisfying assignment on average, assuming there exist one. Now, if you have had a basic statistics course, you should already know that the expected running time, so how often we have to run this algorithm until we find a satisfying assignment assuming the Boolean formula has one is the time for a single run divided by the success probability of one run. And of course, I'm going to let you figure our the final answer for this as our next quiz. So is the expected running time O(3/4^n times the polynomial of n) which is this polynomial here times some other polynomial, which will represent the time for one run of this algorithm here or is it O(4/3^n times the polynomial of n) times the time required for this algorithm here, or is it O(3/4^n times the polynomial for this algorithm here) divided by this polynomial here, or finally, is it O(4/3^n times the time required to run this algorithm here) divided by this polynomial.

## 14 s Expected Running Time

And of course the correct answer here is this one because you're taking the time for one run, which is into the power of c and then you're dividing by the success probability, which is this here which means you need to divide by the polynomial and you need to divide by 3/4^n which is the same as multiplying by 4/3^n.

## 15 q Big O

So the expected running time is O(4/3)^nn raised to the power of some constant divided by a polynomial of n, and actually, we can combine these two terms here because this is a polynomial of n and this is a polynomial of n, which we can simply write as n to the power of some constant. Now, finally, let's see if you remember the use of O notation correctly. What does that equal.

## 15 s Big O

And of course, this is a bit of a nitpicker question but it equals O(1.3c?^nn to some constant) and the reason being is that 4/3 is equal to 1.3333, of course, this goes on infinitely, and for O notation to be correct, of course, you have to round up, no matter what the digit is that you're rounding. So the expected running time for this algorithm here is actually exponential in n. Now, you might be asking--well, okay, so we know that three SAT is solvable in exponential time, now we introduce randomness, which means we're not even sure that we're finding the satisfying assignment, at least not totally sure, and we're still ending up with exponential running time, so what's the deal here.

## 16 q Evolution of 3 SAT Algorithms

This table here is taken from a survey by Tobias Riege and Jorg Rothe. I had chose the development of better and better algorithms for 3-SAT So up here, you can see the deterministic algorithms for 3-SAT and down here you can see randomized algorithms for 3-SAT and you can see how over the years the algorithms get better and better and better and the same thing over here. Now, this is of course going to be a rather easy question, but which of these two algorithm categories generally seems to be better Is it the deterministic algorithms up here or the randomized algorithms down here.

## 16 s Evolution of 3 SAT Algorithms

As you can see the randomized algorithms generally have much better time bounds or well they might not seem much, but you already know that for exponential time algorithms little differences here so 1.47 versus 1.32 actually makes quite a difference so they have generally a lot better time bounds than the deterministic algorithms It seems that randomized algorithms allows us to achieve a better worst-case running time than deterministic algorithms. Of course, up here it has the advantage that it's actually done after this amount of time whereas down here, it's always an expected running time and as I said you're gambling a little bit. You cannot be 100% sure that the algorithm has found a satisfying assignment even though there might be one.

## 17 q Randomization is better

The question, of course, is if this table here actually suggests that randomized algorithms for 3-SAT will always be faster than deterministic algorithms. And I would argue no here, simply for the reason that such a statement would in some way have to be proven or at least argued for with more than just empirical data, but I believe we can also think of some other reasons why randomized algorithms currently have better upper bounds than deterministic ones. So my question for you to think about is, what could be the reasons that the randomized 3-SAT algorithms that we've just seen in the table have generally better bounds than the deterministic ones? Could it be because randomization is more powerful somehow than determinism? Could the reason be that no one has yet had the right idea simply how to design a better deterministic algorithm? Or could it be that analyzing deterministic algorithms is in a way more complex than analyzing randomized algorithms?

## 17 s Randomization is better

And I would ague that all three arguments here are in a way correct. So you could argue that randomization is more powerful than determinism. Of course, we don't know why and we don't know if that is so, but it could be a valid reason that somehow randomization is a bit more effective but I think you could argue that randomization is in a way or that randomization could in a way be a bit more effective than determinism. After all, you're not really saying how much more effective, but maybe you gain a little edge through trying randomization. Of course it could also be that simply no one has had the right idea yet how to design a really really good deterministic algorithm for 3-SAT. Of course, the links of the table that I've just shown you suggest that people have put a lot of thought into this but maybe tomorrow, somebody comes up with a really good idea for a comparably fast exponential time algorithm for 3-SAT, and finally, I think this is also an important point, it could be that analyzing deterministic algorithms is just way more complex than analyzing the randomized ones. The 3-SAT algorithm, the randomized one that I've shown you with a running time of 1.334^n some polynomial is very very simple and it's analysis is also not that complicated. For the deterministic algorithms, however, you have to go through a lot and lot and lot of analysis to prove those bounds. So, it could be that you could modify some deterministic algorithm to be better than the randomized one but simply the analysis required to do so, is much more complex and by analysis, I really mean that you don't really have to modify the algorithm that much but just look at its performance a little more closely. So, there are couple of reasons why the randomized 3-SAT algorithms could be better than the deterministic ones, but of course, we don't know for sure.

## 18 q When to use Randomization

Why have we spending so much time on 3-SAT in this unit instead of our usual problems, vertex cover, clique, independent set, and shortest tour? The thing is, there are very little know results for randomized algorithms for these problems here. Of course, there are still many randomized algorithms for these problems here especially for shortest tour, some also for clique and independent set, but these are all algorithms where you don't know any guarantees. You don't know if the algorithm will perform well or if it will produce a very suboptimal solution, which of course leads to a question where you could ask, well, in what cases might it be okay to use an algorithm that is randomized and does not offer any guarantees? And of course, I'll let you think about that. Would it make sense to solve a problem with a randomized algorithm where you don't have any guarantees if the stakes are low? If no better algorithms are available, so the deterministic algorithms have failed and there's no randomized algorithm known with a guarantee, is it when you can think of any better algorithm, or is it when nobody you know including yourself knows any better algorithm than a randomized algorithm without guarantees?

## 18 s When to use Randomization

It's okay to use such an algorithm if the stakes are low. It's also okay if no better algorithm is available, I mean if you need to solve your problem, better have a suboptimal solution than no solution at all. Only if you don't know better, that was probably the only answer where I would argue with you because even if you don't know better algorithm for a problem, you should ask around. So, for example, when you're working on a problem like shortest tour, there are lots of specialized solutions, specialized algorithms, special cases, so better talk to an expert before settling for a suboptimal algorithm, but of course I realized that this quiz is a bit subjective and you might object to some of my answers. For example, saying that the stakes are low is something that is rather dangerous because sometimes even a small improvement can be worth a lot of money or save lives, so you have to be careful with this one, and also here if it's a high-stake problem, well, maybe it's worth putting some thought into it and designing your own better algorithm especially if it's not one of those famous problems here, chances are that maybe nobody has looked at it closely enough. In any case, and I think you remember my point from the beginning of this unit, using a randomized algorithm without any guarantees or detour in sight is always a gamble. So you need to think about if you want to take that gamble. It might be okay, it might not be.

## 19 q Randomization vs Determinism

It's actually a wide open discussion whether randomization is actually helpful because with considerable analysis many randomized algorithms can actually be turned into deterministic algorithms by using a process called derandomization, and derandomization usually keeps fundamental properties such as polynomial running time, all the base of the exponent. Randomized algorithms therefore might be a bit faster but it's unclear whether this is due to luck if you will or lack of thinking. So, if you were to pit these two against each other, randomization versus determinism, it's not really clear who would win or if it's a draw. So to conclude our exploration of randomization, what might be some reasons that randomization is actually at least a bit more powerful than determinism? Is it that randomized algorithms are in some way harder to trick into a worst case behavior as compared to determinism and after all we're analyzing all of the algorithms using worst case analysis. Is it that randomized algorithm can usually be easier to analyze than deterministic algorithms or is it that randomized algorithms actually explore solutions that their deterministic counterparts could miss.

## 19 s Randomization vs Determinism

And I think there are two correct answers here. The first one is indeed that randomized algorithms are in some way more difficult to trick into a worst-case behavior than deterministic algorithms. Now, to illustrate this for example think about the greedy algorithm that we use to approximate vertex cover. We could trick that approximation algorithm into a worst-case behavior simply because we expose the way that it works and we knew how it would work because it works deterministically. For randomized algorithm, of course, we never quite know what it is going to do. So, constructing a worst-case instance or finding an instance that really exposes worst-case behavior here might in some way be more difficult than for deterministic algorithm. Randomized algorithms can definitely be easier to analyze than deterministic algorithms although of course this one is a little bit debatable since statistics and probability can also get very nasty but when you get down to search tree analysis versus the analysis of random algorithms and with what probability they produce correct results usually these over here are easier, and we also had that in the last quiz where I mentioned it. Now, finally a random algorithm does not explore solutions that a deterministic algorithm would miss. Rather, it's the other way around. The deterministic algorithm at least if it's not in the approximation algorithm will always guarantee that you find a correct solution. On the other hand, with the randomized algorithm, you know about your probabilities because you're demanding guarantees, but nevertheless, you can never be 100% sure. You might be 99.9999% sure that it produces the correct answer but never 100.