cs313 »

For some problems, there's no constant factor approximation algorithm unless P=NP or there are constant factor approximation algorithms but we know certain limits for how good that constant factor can get. There are other problems that allow for what is called a Polynomial-Time Approximation Scheme and the idea behind the polynomial-time approximation scheme is basically that if you put in more running time in to your algorithm, and that algorithm is called a polynomial-time approximation scheme, then you will get a better solution or a better guarantee for your solution, and depending on how good you want that guarantee to be, then again certain implications for the running time you have to put in. So, for an almost perfect solution, you will likely have to put in time that is pretty close to exponential time or even exponential time and for a solution that is--well, maybe only within a rather course bound of an optimal solution, you will need less time. Now, the NP complete problems that we have encountered so far obviously fall either into the category of having a constant factor approximation algorithm or admitting no such algorithm and that is why I'm going to introduce a new NP complete problem to you now called knapsack. A knapsack is a very simple to describe. Knapsack gives you as input a set of objects such as these here and each object has a size and a value and those are both integer number, so you have an integer size and an integer value and we're soon going to do an example for this, and then, additionally, you have given a container and that container has a limited capacity, and that capacity again is an integer, and the question you're trying to answer with knapsack is actually very simple, you're trying to ask the question, "What is the maximum value that I can put into this bag here while observing the limited capacity?" So the total sum of the size of the objects that I select to be in the knapsack, so say I select this object to be in the knapsack and this one here but I don't want this one here, then that means that the total size of these two objects cannot exceed the capacity of the container and among all other possibilities of putting objects into the container without exceeding maximum capacity, this gives me the best possible value, and of course, here for the values, we again do the sum to calculate the total value. Knapsack is known to be NP complete and now let's do a little example for this problem just to familiarize yourself with this and I'm going to give you a number of objects here, and of course, each of these objects has a size and a value. Now, my question to you is if I give you a container and that container has size 10, which objects out of these seven here should you put in to the container to maximize the value without exceeding the size here. Please check all the items that you should put in to the container to maximize the value.

And the correct here is that you should put this one here B, C, F and G. Now, I've chosen the sizes and values so that this is not that difficult for you to figure out even though it is an NP complete problem in the general case. I suppose you have seen that there's lots of objects here, which have a comparably small size and a very large value. So probably putting in F and G is rather obvious and also I would presume putting in B and then once you're there, you can see what else you got and you've already used up And then the best value of the remaining objects--A, C and D is given to you by C, which has a value of 5 as oppose to 4 and 2, and also you cannot put in more than one. So that makes C the only candidate to also put into that container, and for that, you get a total value of 3+5+5+2, so you get a value of 15 in your container. Actually, not using up all the space in the container because you only have size 2, 4, 2, and 1, which is the total size of 9. So when a container of size 10, you can carry a value of 15 given these objects over here.

Knapsack is a very interesting problem because in a way and this is very informal, of course, it is among the easiest NP complete problems. So, where as clique and independent set for example seem to be very difficult, Knapsack allows for some pretty interesting algorithms and both of those algorithms which will be important for our P-test is called pseudopolynomial algorithm and you'll soon see why it is named that way. I'm first going to explain the algorithm to you; so you're given an instance of knapsack, which means you're given n objects. Each one has size s so we'll just label those sizes here S?, S? until you get to Sn. Each object also has a value so the values will be labeled in the same way. So you have V?, V? and so on until you get to V-sub-n. And now, here's the algorithm to solve knapsack. The algorithm basically works for the table that looks like this. The table has n rows one, two, three down to n and the table has a lot of columns one, two, three, four, five, and so on and so on and so on until we get to a number that I'll call V. And by V, I mean the sum of these values here. So the total V? plus V? and so on plus V-sub-n will be called B. And this is the table that we'll now use. Of course, I still have to tell you what values go into that table. Each cell here in this table is going to have the following meaning. I'll call the row that we're at I and I'll call the column that we're at J. Then, this cell here in row I and column J has the following meaning The cell here in this table will be the minimum size required to achieve a total value of exactly J Using the objects one, two, three, four, and so on until you get to I, so you cannot use all objects So, for example, in the first you row, you can only use object one In the second row, you can only use objects one and two and in the third row, objects one, two, and three and so on and that means that sometimes, of course, it's not possible to achieve a value of exactly J using these objects and so, we'll just write a dash into the table if that's not possible. So, let's use the example from before so you have the objects A, B, C, D, E, F, G and here we have the sizes and the values. So, the sizes were three, two, two, three, and so on. And to construct this table here, we'll, of course, have to label those items here So, this is item one, this is item two, three, four, five, six, seven. So, my first question to you now, a brief quiz here. What do we put in here? So, how many columns does this table have and how many rows does this table have?

So the number of rows is the number of objects and that is 7, and the number of columns is the total sum of all the values, so 2+3+5+4+3+5+2 which is 24. So, we have 7 rows and 24 columns in this table. We're going to fill this out row by row, so we start in row 1 then row 2, and so on. So for row I and column J, what we want in this table is the minimum size required to achieve a value of exactly J using the objects 1 to I. Now in the first row, we only have the first object available, so the only thing that we can achieve is a value of 2 using size 3. So, value of 2 using size 3 and all of the other ones are just not possible. Second row, we have two objects available, we have objects A and B available, so what we can achieve is a value of 2, a value of 3, and a value of 5. Now for value of 2, we need size 3 again. For value of 3, we can do that for size 2. And for value of 5, we need size 5 and I'll put out one more row here for you. I'm not going to show you the part that goes beyond 8th row because that will not be important for what I'm about to explain. The interesting part here is this one here because object number 3 has size 4 and value 5 which means that we can achieve value 5 actually by putting in this object instead of using A and B as we did in the row before. Let's go to row 4, because in row 4, I can show you something very interesting. First value is not that interesting. So remember that in this row here, row number 4, we now have available to us, object A, object B, object C, and object D. If we are to go through the whole table, how efficiently can we fill out this table? And now that we have D available here in row number 4, I can show you something very interesting because you can fill out this table quite efficiently. Now, here we are in the column labeled 7 which means we want to achieve a value of 7. How can we achieve a value of 7? So you already know that we can achieve a value of 7 using these three objects, A, B, C because we have this row here. Now, what are the different possibilities of achieving a value of 7? We could either do this through the row above because we know that we can achieve it using objects A, B, and C, so it could be row above. We could also try to take D directly. That D has a value of 4 so here, that's impossible, so take the new object and finally there's only a third possibility here. Well, since D has a value of 4, we could try to achieve a value of 3 using the objects we had in there before and then add D to it. So add D to another set and that is the set of size 3 and we know just from looking at this table here that at size 2, we can already achieve a value of 3. D has a value of 4 and has a size of 5, so 2+5 is 7, which is actually now the same as this one here but of course you would be taking the minimum of the required sizes. So here, we would enter a 7. The interesting part is this. In order to fill the cell here, we do not need to go through all possible combinations of the objects that we have, we just need to consider three cases. Each of which can be calculated in constant time if we have the row previous to the row that we're trying to fill out.

And of course, here it's time for another quiz.
So my first quiz to you--you can already see there are going to be two--is what is the size of this table?
Is is O(n²), is it O(n*v) and you remember v is the sum of all values, or is it even precisely n v?*
And of course, I want you to give me the best possible answer here.

And, of course, the size of the table is exactly n*v because we have n rows*
and v columns, where v is the sum of all the values.
So the size of table is n*v.*

And of course, once you have the complete table, then you also have found the solution to knapsack
because once you have the complete table, you just start out in the right-most column
and then go to the left, to the left, to the left, to the left, and each time, you look through each column
and find what is the minimum size that I would need to achieve exactly this value
and once you find a size that is at most as large as your container you would be done.
So constructing this table here solves knapsack.
Now, my question, the second quiz is of course the time required to build this table.
Is that time 2^n given that we have n objects here, is that time O(2^n*n) or is that time O(n v)?*

And the surprising thing here is--well, it's not surprising anymore, but surprising once you see it--
is that the time to construct this table is O(n*v).*
There's no exponential part in the running time for bringing this table, apparently at least.

Now, wait a minute, we have an NP complete problem here knapsack and have just shown
that the running time for solving this problem is of the order of n*v given n objects*
and the total value of those objects, if you sum it all up, of v.
Now, I just have to quiz you here of course.
Does this mean that P=NP, although I've told you we don't know, and of course, I don't?
I just want you to select something here but also to think about why.

Now, the answer is rather obvious because I've already told you that we don't know if P=NP and actually, we believe that it isn't the case. The interesting part about this one here is why is the answer no, and the answer is that v might actually be exponential in n. So when we build this table here and you remember it has n rows and it has v columns, v could be some exponential function of n, so v could for example be 2^n because we haven't specified the maximum value of those objects here. And that is why this algorithm is a pseudopolynomial algorithm because it looks very polynomial, but actually there are cases where it is not polynomial. But you can already see why I said at the beginning that knapsack seems to be one of the rather simple NP complete problems--simply because the existence of the pseudopolynomial algorithm here.

Now, if you really want to bug me here, you could of course now argue the following-- why not just divide all of the values--each value of each object by v. Then the total sum of objects would be 1 and the running time here would presumably be O(n). So is that a possibility?

And the answer to that question of course is a little more subtle. So there's two reasons why that doesn't work. The first reason is a very formalistic one. The RAM can only handle integers. Or at least the RAM cannot handle numbers with an arbitrary precision because having an arbitrary precision is exactly the same thing as having an infinite amount of values available for each variable. And we said that the RAM doesn't support that. And actually your computer also doesn't fully support that. So you might be working with numbers such as 1.2345 or so, but these numbers are always rounded numbers. So also your computer does not have arbitrary precision. It' s pretty well thought out normally how your computer handle such numbers which is why we normally don't notice but in general there's no arbitrary precision for your computer also. Secondly, the table size wouldn't change. So we're not allowed to do this. But let's say we're allowed to do this and then represent all of the values in the table exactly. Then, the table size itself wouldn't change because you start out with a table that has And now in your new table, you would just have the same amount of columns only the columns would be labeled 1/v, 2/v, 3/v, 4/v until you get to 1. The third and the fourth are not valid reasons. Just because then P would be NP is no counterargument against this. And of course, as I've just shown you, that is also not true. But asking this question here leads us to something else. What if we were to divide all values by some number, which I'm going to specify to you, and then round it down. By rounding, we do two things. So by rounding, we would make numbers that the RAM can handle. And also in the case of rounding, the table size would change. So let's say you had a table that looked like 0.5, 1, 1.5, 2, 2.5, 3, and so on. Then, you could always put two columns together. So these two. Except for the first one. That would become 0. But all the others you can put together. So those two would become column 1. Those two would become column 2. If you have 3.5 here, they would become column 3 and so on. So you would be able to scale down the table but of course the whole thing comes at a price. Because scaling down the table means that you put two columns together only taking the minimum of it and therefore possibly arrive at a suboptimal solution. Because certain objects where one object has a slightly higher value than another one become indistinguishable using this technique. Well, we're not going to divide by V. We're going to divide by some factor. Dividing all values by a certain factor is a good idea but we have to round and this gives us an error. Now, if you take a close look at this, you can almost see the approximation algorithm emerging here because what happens when this factor here becomes very large? Well, then we scale down the table that we build a lot. So here we have n and here we have v. If we divide all values by some large factor and of course we have to round, then this table here becomes very small but of course our error also increases. And if this factor here is very small, which means we almost keep all of the original table, well then our running time is still very large but the error that we make is potentially smaller. So this is exactly where the time and quality tradeoff will come from. Let's now have a look at that in more detail.

Now, let's have a closer look at approximation and again, we're assuming that we're given
an instance of knapsack, which means we have any objects and the sizes
and the values of those objects are given to us
and you know that the total sum of those values here we'll call V.
So what we said now is that we wanted to scale down all of these values.
So we'll scale this down to V?/x and then round down and I'm going to write the rounding down
like this, which is called a floor symbol--in case you haven't come across that.
So I'm going to write down the same thing here, so this is going to become V?/x,
this here is going to become V?/x, and so on.
We're going to have Vn/x.
So what rally do we use here for the x.
Well, of course, that takes a bit of either playing around or pre-knowledge.
In this case, we'll set this x here and we're going to set this to V/n*1-c.*
Now, you already know most of the constants here, so you know n--n is the number of objects
and V is the sum of all values.
Now, c is some value I'm not yet going to specify to you.
We'll just say for now that c lies between 0 and 1.
Now, using these values here, if you use the knapsack algorithm using the table,
my question to you is what is the running time now of the knapsack algorithm using these values here.

So the original running time here was O(n*V) since the V was this value here and now*
since we scaled it down using this factor here what we now have to do is we have
to replace V by V/V/n(1-c), which is the same as O(n²*1/1-c), which is the same as this one here.*
Now, of course, in order to be very correct here, we have to constraint c so that it cannot be 1;
otherwise, this running time here would be undefined--well, that's okay.
As you will soon see that actually makes quite a lot of sense.
So you've seen that we now have made the running time independent of V
and that is of course a very cool thing because V originally was what could make this algorithm
run in exponential time and now we can use this factor 1-c here to control the running time.
So the running time will, of course, again become exponential if c becomes very close to 1,
but it will be polynomial if for example we set c to 0
and something in between when we set c to other values.
Now, I still haven't told you what the c here is going to be about, but we'll soon figure that out.

Now, of course, using the scaling process here, we kind of allow the algorithm to make errors and what I would now like you to think about is what kind of error are we introducing by using this scaling method. So dividing by that quantity here and then rounding down.

And there's two answers that are correct here--namely, the first two ones. So the error actually comes from the rounding because as we discussed before-- if we did not round, then the table size will be changed and also the solution that we get. So the objects that we picked wouldn't change, but through these rounding down here, what can happen is that two objects that actually have a different value now appear to the algorithm to have the same value. So the algorithm say of objects V? and V? and we're going to do an example here to say-- V? has a value of 8 and V? has a value of 9, and we're now choosing this x here to be 2, then V? gets the scale down value of 4 and V? gets the scale value of 4.5. Now, assuming that both objects have the same size, it would be better to take V? instead of V?, but now if we round down, both objects get a scales down value of 4, so the algorithm might pick V? over V?, although that is sub-optimal, but the algorithm can't tell anymore because both objects in terms of value look the same to the algorithm. The cool thing is that we can actually specify and quantify how large that error is going to be. So this is not right. We can't specify it and this is exactly what we are now going to do.

The optimal solution to any knapsack instance contains at most n objects. Now, if we introduce this rounding error, what could happen is the following-- You have your n objects here and the algorithm tells you that it's best to put in this one here, this one here, this one here. However, if we were not to round, it could be that it would have been better to take this object here instead of that one and maybe this one here instead of that one. So you pick sub-optimal objects to be in your solution. If we have an object here that we should have taken and let's call that object--object A and there's another object here that we have taken instead, what is the maximum amount of value that we could have lost. Well, the algorithm will only take A instead of B if both objects in terms of their value make the same to the algorithm. So if the value of the A divided by this quantity x here, round it down is the same as the value of B divided by X also rounded down. Or in other words, Va - Vb is equal to 0. Now, my question to you and this is going to be a little of a challenge-- If I removed these running downs here so I consider the original quantities without any rounding, what then is the maximum difference between these two fractions here. And please enter your answer here.

And the answer here is because these two fractions round down to the same value
their difference must be smaller than 1 because as soon as their difference is at least 1
they will round down to different values.
Also, you should notice that we should've taken A instead of B that this is positive.
So the whole thing here is also greater than or larger than 0.
So x is a positive value. So we can just multiply this inequality here with x.
And what we get then is this.
We get that the value of A minus the value of B, so the difference between the two, is smaller than x.
And now let's take this information here together with this information.
An optimum solution contains at most n objects
and the mistake that we make for each object is at most x.
So our maximum total error of the solution that we get versus an optimal one is n times x,
the number of objects times the mistake that we could've made for each object.
So let's take this information here and put it right under the running time, n times x.
And of course that is an absolute error. It's not a relative error.
So let's turn this absolute error here into a relative error because that is what we want to know
for an approximation algorithm.
What this means here that the absolute error is at most n times x,
it means that the approximative solution is at least as good as an optimum solution
minus n times x.
Now of course we want to turn the absolute error into a relative error here
because for approximation algorithms we're always talking about approximation factors.
Let's divide both sides here with a value of an optimum solution.
So we take this away, turn this into a 1, this gets divided by the value of an optimum solution,
and this one here get's divided by the value of the optimum solution.
Now, let's rewrite this nicely.
Now since the optimum solution can be at most V if we take all objects here.
We can write it also like this 1 - n*x/V and we also know what x is.*
X is V/n (1-c). So this is 1 - (1-c) and this is equal to c.
So the ratio of the approximativee solution to the optimum solution is at least c.
And now we have an approximation factor which means that c, the constant that I didn't tell
you about in the beginning, is actually the approximation factor.
So you can see if we want to achieve an approximation factor that is very, very, very good,
that means that c is very close to 1.
But if c is very close to 1, then the running time of this algorithm here is quite large.
Because if c is close to 1 this down here, this part down here with the fraction becomes close to 0
and then the running time of the algorithm of course increases.
Whereas if you're content with an approximative solution that is only so-so,
then you can choose c to be closer to 0, which also means that you get a better running time down here.
And this is also why this is a PTAS, a polynomial time approximation scheme.
You can get as close as you want to in an optimum solution but you have to pay in running time for that.
The running time is not only polynomial in n,
but also polynomial in the approximation factor that we want to achieve.
This is why such a PTAS would actually be referred to as a
fully polynomial time approximation scheme as opposed to a general polynomial time
approximation scheme where c can also appear in some exponents.
I still sometimes find it mind boggling that Knapsack is np-complete meaning that
in terms of intractability Knapsack is likely to be just as hard to solve
as nastier problems such as SAT or clique.
So approximation I think shows very well that even with an np-completeness
there's a broad spectrum of difficulty, if you will.
So some of the most difficult problems to approximate
appear to be clique and independent set and then in the middle of the spectrum
we have problems like vertex cover.
And then we have problems which have a PTAS
or even a fully polynomial time approximation scheme such as Knapsack.
All of these problems here are np-complete but some of them can be approximated
with no constant factor at all.
Others have algorithms that if you don't look closely
it could almost be mistaken for being polynomial.
Of course all of this is still under the assumption that p does not equal np.
If p equals np, then all of these problems here would be polynomial time solvable anyway.
Now that we have looked at approximation,
what else is there that you could try to tackle np-complete problems?
Next time, we'll be basically poking around meaning that we'll be looking at
how randomization and random algorithms could maybe help us tackle these problems here as well.
So see you next time.