Transcription of videos to improve these notes can be found here - CS215 Lesson 1
All of the code used in Lesson 1 can be found here: Code
This class will focus on algorithms, especially on graph algorithms (the kind you'll need to analyze large social networks).
You should be familiar with Python and maths till algebra 2.
In this lesson we learn about how to take advantage of maths to simplify problems, some basic social network terminology and we also look into the design of programs.
In this chapter, we learn a 'magic trick' (explained in the quiz). First off, we learned about a Eulerian path. A Eulerian path is a path on a graph, that starts at a node and traverses (travels along) every edge exactly once in the graph and ends at a node. Also, if the starting node and the ending node are the same, then this Eulerian path is called a Eulerian tour.
In this diagram, one Eulerian path (there are other paths as well) is BW -> TH -> ST -> BW -> CL -> ST -> RW -> DD -> BW -> HJO -> TH. This, however, is not a Eulerian tour.
Find a Eulerian path in the graph and enter the node you ended at. Make sure to traverse each and every edge once and only once.
The answer (the node you ended at) is KB. How did Michael Littman know this? He used a 'magic trick'.
No matter what path you choose, you are going to end at KB, as long as it's a Eulerian path. Before the trick is explained, we need a couple of definitions. This data structure that we used - with a few lines (edges) connecting some circles (nodes or vertices) is called a graph. The number of edges that are connected to a node is called the degree of that node. In the graph for this question, DH is a node and has degree 4.
The explanation for why this is, is simple. If you reach a node while traversing the graph, you have to leave the node through some edge (not the edge you came by). So, the edges connected to a node (the node's degree) always have to be even. The only exceptions are the start node and the end node - they can have odd degree. Since in this graph the only nodes having odd degree are MS and KB and you start at MS, you MUST end at KB if you found a Eulerian path. Similarly, if you started at KB, you MUST end at MS.
We also define a social network: it is a set of connections between individuals that capture relationships between them.
We have already defined a Eulerian Path, named after a mathematician, Leonhard Euler.
It turns out that the relationship we studied earlier, is in "if and only if". Meaning that the reverse is also implied, i.e., if the graph is connected and has exactly 0 or exactly 2 odd degree nodes, then it has an Eulerian path, and if not, then it doesn't.
Can a graph like this with only even degree nodes have an Eulerian path?
The answer is C. This graph does have an Eulerian path, but this Eulerian path is also a Eulerian tour. Because the path actually started at some particular node, and there are no nodes with odd degree, the path should also end at that node. Therefore, every graph that has only nodes with even degrees has a Eulerian tour and every Eulerian tour in a graph will require the graph to have only even degree nodes.
Let's try to disprove case D. We make a graph in which more than 2 nodes have odd degree.
F, G, H, I are to be connected to have odd degree. You'll see that there is no way we can have a Eulerian path in this graph now (try it out!).
Good algorithms allow our programs to run faster. They're also really clever - really pretty, mathematically. One way to tweak an algorithm to make it faster is to work on optimizing loops. But, of course, no matter how fast an algorithm runs, we need to first make sure that it computes the right answer - that it works properly and is correct.
We normally follow the following steps:
The natural ways that math can really help:
def naive(a, b): x = a y = b z = 0 while x > 0: z = z + y x = x - 1 return z
Take a look at this piece of Python code and try and recognize what it does. You are encouraged to run it in Python and try giving it different inputs. Study the inputs and outputs and once you have an idea, try looking at the code (algorithm) and reason out why your guess is right. So, what does this program do?
Returns the product of a and b
Hi, and welcome to CS215 Social Network Analysis.
My goal in this class is to give you an introduction to the analysis and design of efficient algorithms. We're going to focus on algorithms that will be useful to you if you're trying to analyze large social networks.
I'm going to assume for this class that you have a familiarity with programming in Python and that you're comfortable with mathematical notation about at the level of high school Algebra 2.
Lesson 1 is going to be a bit of a warmup. I want you to think about how you design computer programs to solve particular problems. What we're going to focus on is how we can take advantage of mathematical properties of the problem that we're solving to speed things up. I'm also going to introduce some social network terminology, so that we'll be ready to dive into this topic a little bit more deeply in the later lessons.
Hi everybody! Welcome to CS 215. My name is Micheal Littman, and I'm going to be your instructor.
Let's get off with a little magic trick. All right! Here is a collection of actors from movies. This is Bruce Willis, Danny Devito, Cloris Leachman, Stanley Tucci, Tom Hanks, Haley Joel Osment, and Robin Williams. I'm going to connect together actors who were in movies together with lines that represents those movies.
So Bruce Willis and Tom Hanks were in a movie together called, "The Bonfire of the Vanities" that seemed like it was going to be very good but it was not. Bruce Willis and Cloris Leachman were in a movie together called "Beavis and Butt-Head Do America"--it was not a very popular movie. Tom Hanks and Stanley Tucci were in a movie called "The Terminal." Haley Joel Osment and Bruce Willis were in "The Sixth Sense." They were the stars of that movie. I will not tell you the ending, but Haley Joel Osment was also in "Forrest Gump" with Tom Hanks. Robin Williams and Stanley Tucci were in a movie called "Deconstructing Harry." Cloris Leachman and Stanley Tucci were in a movie called "Gambit". Danny Devito and Robin Williams were in "Death to Smoochy." All right! Next I'm going to connect Bruce Willis and Danny Devito, who were in a movie together called "I'm Still Here." And lastly, I'm going to connect Bruce Willis and Stanley Tucci, who were in a movie called "Billy Bathgate." All right!
Now that we have the diagram, I'm going to start off with one of the actors. Let's say Bruce Willis and I'm going to move around on this structure visiting all the movies exactly once and so let's see how that goes. So there's lot of different choices to start off. Let's say, I'm going to go from Bruce Willis to Tom Hanks, Tom Hanks to Stanley Tucci, Stanley Tucci back to Bruce Willis, Bruce Willis to Cloris Leachman to Stanley Tucci again, to Robin Williams to Danny Devito to Bruce Willis to Haley Joel Osment, and then to Tom Hanks, and now you can see that I've visited all of the movies exactly once and I ended up at Tom Hanks. All right!
So let's try this again with a different set of actors. Now we've got Susan Sarandon, Dustin Hoffman, Julia Roberts, Kevin Bacon, Robert De Niro, Anne Hathaway, and Meryl Streep. And then we're going to put in some of the movies again. Robert De Niro and Dustin Hoffman who were in a movie together called "Wag the Dog." Robert De Niro and Meryl Streep were in a movie together called "Marvin's Room." Dustin Hoffman and Susan Sarandon were in a movie together called "Moonlight Mile." Dustin Hoffman and Julia Roberts were in a movie together called "Hook." Dustin Hoffman and Kevin Bacon were in a movie together called "Sleepers." Susan Sarandon and Julia Roberts were in a movie together called "Stepmom." Kevin Bacon and Julia Roberts were in "Flatliners." Nobody can tell from some of the movies and the actors that I've picked that I watched more movies in the 80s than I watch now. Meryl Streep and Kevin Bacon were in a movie together called "The River Wild." Meryl Streep and Anne Hathaway were in "The Devil Wears Prada" together. And Anne Hathaway and Julia Roberts were in "Valentines Day."
We're going to do this as a little bit of a quiz so I'm going to mark each of these edges as something that you need to visit. You need to visit all of them. I strongly suggest that you actually check them off as you go because it's really easy to do one twice or forget one by accident. So this is what I want you to do, I want you to go through and start at Meryl Streep and move along the green lines here at the movies and each time you visit a movie, check it off as you go. So the right answer in this quiz is to have all the edges checked off and none of them checked off twice. It's really easy to mess that up, so be very careful. And then when you're done, I want you to mark the little bubble corresponding to which actor you end up with at the end of your path. You can follow any path that you want--you just have to make sure that you visit each of the movies once. All right! I'd like you to actually go and do this and check off the actor that you end up with.
If you did this right, it should be the case that all of edge boxes are checked off, and you ended up with an actor. Now--this is the _magic trick part_--I'm going to tell you which actor you ended up with. The answer is Kevin Bacon. Ta da! Pretty neat, huh? Let's try it.
I'm going to follow maybe a different path than you did, but I'm going to do my best to try it. Just to try to thwart this, maybe I'll do directly to Kevin Bacon. I'll visit this edge. Now I'm at Kevin Bacon. I go to Dustin Hoffman. I go to Susan Saradon. I go to Julia Roberts. I can't go do Kevin Bacon now, because I'll be at a dead end. I visited all the other edges that have gone through Kevin Bacon, so I'll go to Dustin Hoffam to Robert Deniro to Meryle Streep again to Ann Hathaway back to Julia Roberts, and now I can go to Kevin Bacon and be done. It may or may not have been the same path that you followed, but I did end up at Kevin Bacon. You can try this a couple different ways, and you'll see that you'll always end up with Kevin Bacon.
So that's our magic trick. This example illustrates two things that are central to this course.
_The first is the idea of a social network_. We often think of social networks as being particular websites that exist these days, but really what a social network is is connections between individuals that capture relationships between them. These days, of course, they ubiquity and scope of social network data is exploding. Clever computer software is needed to reveal interesting patterns and answer important questions about this data. It's a really neat area.
_The second focus of this course is on the magic of algorithms. Algorithms are just how we organize computations to solve a particular problem. Some algorithms are really straightforward, but other algorithms take advantage of subtle mathematical properties to quickly and accurately provide an answer. Those are the ones that feel like magic.
With that said, now I'm going to try to tell you how this trick actually worked and what it tells us about these kinds of structures. Let me first point out that this kind of structure where you've got a set of circles and lines connecting between them is what we in computer science and discrete math call graph. In this particular graph, the actors are playing the role of nodes, sometimes called vertices. And the movies are playing the role of edges or links. I'll probably mostly say nodes and edges, which kind of mixes a metaphor, but that's how I'm used to saying it. We can also talk about the degree of a node in a network. For example, this Dustin Hoffman node here is a node in the network. It has got degree 4 because there are 4 edges coming out of it--1, 2, 3, 4.
In the graph, we can talk about something called an Eulerian path. This is the path that starts off with some node on the graph, let's say A in this case. And it moves along the edges from node to node, hitting every edge exactly once, and then ending it from node to the graph.
In this case, we have an Eulerian path that's started at A and ended with D. Euler by the way was a famous mathematician, Leonhard Euler. He made a number of contributions to discrete math and ultimately, computer science. Here's something that we can notice about a graph. If it has a Eulerian path, notice what has to happen to all the nodes along the path. Here's what we did in this example, we went from A to B, to D, to A, to C, to D. Now let's consider the degree of all the nodes in the graph. A has a degree of 3, B has a degree of 2, D has a degree of 3, and C has a degree of 2.
Notice that these nodes B and C, which were not either at the beginning or ending of the path, the Eulerian path, have an even degree and that has to be. What happens if all the nodes that are not in the beginning and ending nodes have to had even degree because the path has to go in one edge and out another edge.
Every time that node gets passed through, 2 of the edges are visited, and none of them are visited twice and none of them are skipped. So it has to be that all these nodes that are intermediate on the path have even degree. The beginning and ending nodes are different though. If the path starts off at A and leaves A along one of the edges and that consumes one of those edges. The beginning and ending nodes, A and D in this case, have to have odd degree because, in addition to any times that the path moves through them and comes up the other side, it either leaves in the beginning or enters at the end without a corresponding other side. Those have to be odd, all the other ones have to be even.
It turns out that this relationship is in "if and only if"--that is to say if the graph is connected and has exactly 2 odd degree nodes, then it has an Eulerian path, and if not, then it doesn't. Now, there is one exception to this.
What if all of the nodes in the graph are even degree? Let's take a quick look at a special case here. Here is a graph with 5 nodes and the degree of all of the nodes is even--2, 4, 2, 4, and 2.
Can a graph like this with only even degree nodes have an Eulerian path?
All right. Let's look at the answer. Here's a particular graph where all the nodes are of even degree. Let's see what happens when we try to follow an Eulerian path. Let's just pick any node to start--say E--and go to C. There are some choices here. Let's follow an edge back to B. Then maybe to D to C to A to B and to E. So it looks like we were able to hit all of the edges exactly once. This graph does have an Eulerian path.
But there's something very special about that path, and that it is started at E and it ended with E. Because the path actually started and ended at the same node, that node should have even degree, because each time we go into we come out, except for the first time where we go out but then the last time we come back in. Everything matches up and you end up with even degree.
This is a special kind of Eulerian path called an "Eulerian Tour." "To tour" in the sense that we start off in our home city, and we go around, we visit lots of things, and we come back to our home city. We kind of did a tour of the graph, and it was very scenic. That's an Eulerian Tour. This first answer is definitely not correct. Such a graph can have an Eulerian path. No, it doesn't depend on the graph. It turns out that this is going to be fine no matter what, because we're going to end up starting and ending at the same node. We could have actually started and ended on any of the nodes and it would've worked out the same way. All such graphs do. All graphs that have only even degree nodes do have Eulerian paths, specifically Eulerian Tours, but it's not the case that all such graphs do. Let's make a quick example graph just so that you can see it. For this to work we need to have a graph where more than two of the nodes has odd degree.
Let's make one where four of the nodes has odd degree.
We have these four nodes--F, G, H, and I. Let's make it so that each of them has odd degree. Now they all have even degree. Now two of them have odd degree. If we connect these two guys, then they have odd degree as well. All the nodes have odd degree. It's not just 0 or 2. All of them have degree of 3. Let's see what happens when we try to make an Eulerian path. We'll pick some node like I. We'll go I to G, G to F, F to I, I to H, H to F, and now we hit a dead end. Maybe we didn't want to do that. Let's instead go H to G where we also hit a dead end.
In fact, you can do this all day long, and what you're going to discover is there's always going to be at least one edge that you just can't visit, because again, each node in this case has an odd degree. Every part of the path has to come into the node and out of it, so it has to have even degree except for the endpoint. No matter what we're going to be stuck.
Now, what I've shown you so far is not really an algorithm, but it has some elements in common with them. Algorithms are really cool. That's what the focus of this class is about. One of the reasons algorithms are cool is that they're really useful. Without careful algorithm design we just don't see fast enough responses from things like websites and user interfaces--stuff that we really depend on to be able to have the computers react quickly.
Another is that they're really clever. They're really pretty, mathematically. Just like a magic trick, sometimes learning how it does what it does can be just as exciting as what it actually does accomplishes. The practical part of algorithm design is trying to figure out how to make your programs fly-- that is to say, go really, really fast.
There are a couple different ways that you can do this when you're programming. One, you should take a great deal of care in organizing your programs so that they're not doing a lot of wasteful stuff. That goes without saying, but I said it anyway. There's a lot of time that you can spend tweaking loops and other things in your program to just get rid of little bits and pieces of inefficiency, and that's important.
But perhaps the most important thing is good algorithm design. Whenever your program is doing something that involves a great deal of computation you need to think hard about how to organize that computation so it does what you want it to do, but it does it fast. That's the focus of this course. We can think of the problem of developing algorithms for particular problems as a kind of algorithm itself.
def algorithm_development(problem_spec): correct = False while not correct or not fast_enough(running_time) algorithm = devise_algorithm(problem_spec) correct = analyse_efficency(algorithm) running_time = analyse_efficency(algorithm) return algorithm
Here I've written it as kind of faux Python. You can't actually run this. I would not suggest running this, but it should give you a sense of the flow here. What we start off with is some kind of problem specification. We'll talk about a couple different examples of those over the course of the course. For the problems specification we're currently concerned with, what we're going to do is we're going to start off devising an algorithm for that problem. That mean thinking about it, coming up with some kind of strategy or plan for doing the computation. We'll call that our algorithm. They're not really done yet. We need to make sure that this algorithm is actually correct.
The first thing that we want to do when we propose an algorithm is actually analyze the correctness with respect to the problem specification to see if it actually accomplishes what the problem says you're supposed to accomplish. Sometimes that is kind of obvious and straightforward. Sometimes that involves a fair amount of mathematical analysis. We'll see different examples of those as we go.
Once we've analyzed our algorithm to make sure that it's correct, we can analyze its efficiency. It does what it's supposed to do, but does it to it fast enough? We eventually determine the running time of the algorithm. If it's not correct or if it's not fast enough, then we need to continue this process, redevise an algorithm, and reanalyze it, and keep doing this until we have something that both solves the problem and solves it fast enough. That's the algorithm that we declare to be our solution.
Now, a lot of computer science students bristle a little bit when they take an algorithms class, because of the heavy emphasis on mathematics. A lot of us got involved in computer science because we really like doing stuff and not necessarily doing math. But nevertheless, I think there's a very strong case to be made for the importance of mathematics in computer science and in algorithm design in particular.
I'd argue that there are three natural ways that theory stuff or math can really help. One is to just get you thinking clearly about what it is that you're trying to accomplish. It's very easy when you're in the depths of writing code to lose track of what it is that you want the code to actually do. Just thinking formally about what you're doing is something that using your mathematical background can help with.
Another thing that it can be very helpful with is analyzing the efficiency of what you've produced. You can actually know where there are spots where you could be doing a better job and have the code be running more effectively and more efficiently without producing incorrect code. Just taking a moment to think a little mathematically can be a huge win and save you tremendous amounts of time. It sounds very important, right? Now, this notion of efficiency is actually very important to think about. What is it that you want your program to do efficiently. Do you want it to be fast in terms of time? Do you want it to be efficient in terms of the amount of memory on the computer that it uses, so it does its work with as little memory as possible? These days more and more people are worrying about energy usage. Are there ways of organizing your computation so that it is efficient in terms of the amount of power that it uses. The tools that we develop in this course are going to be useful across the board here, but we're going to mainly focus on issues of time.
To get you thinking about algorithms and how they work and what makes them correct and how to make them more efficient, let's go through an example together. Here is a little bit of Python code that I wrote. It's a routine called naive, because I'm not telling you yet what it actually is doing. What it takes in as input are two integer-valued variables that are none negative, and then it does some assignments and recalculations and a while loop. It runs for a little bit and then it returns z.
def naive(a, b): x = a y = b z = 0 while x > 0: z = z + y x = x - 1 return z
What I'd like you to do is take a look at this. It's not very long. It's probably not completely obvious to you right away what it is that it's doing. I encourage you to run this in Python. Give it some example inputs and outputs. See if you can come up with a pattern for what it is that it's doing, and then convince yourself why. Once you have a hypothesis as to what it's doing, see if you can figure out how you can convince yourself that this program actually is computing what you think it's computing. Then I want you to fill out the following quiz. You can take this code, and you can run it for any particular value of a and b, but I want you to think about what it does in general as a function of a and b. When you run naive(a,
Okay, now for the answer. Let's switch back to the code for a moment. We'll even put in an example just to see what it actually does. You should have already done this, but just in case you haven't let's run this and see what it does. We put in 4 and 5. It came out with 20. That actually should be enough for you to make out which of the answers that I gave you is correct. It's clearly computing the product of these two, at least in this example. But why in general is that what it's doing? Let's actually take a look at the code and see what happens.
We start off with
b, and we leave them alone, but we put copies of them into
y. We have a
z variable that we initialize to
0. Then what we're going to do is we're going to keep detrimenting
x, keep subtracting
x until it hits
0. Each time we subtract
1 from x we add y into
z. We're starting off with
0, and we add
y to it--how many times?
z at the end of this is
x added to itself
x * y. It is, in fact, computing the product of these. Now, that was kind of an informal argument. Can we be a little bit more formal and mathematical to show why it is that what this is computing is the product of
Let's go through and actually do a proof of the correctness of the naive algorithm we just spoke about.
We're going to proceed by taking advantage of a particular observation. What we're trying to prove here is the correctness of the claim that
naive(a,b) outputs the product of
The observation that I'm going to make is that before or after the
while loop in the implementation of
naive, this statement is always true.
x*y + z = a * b
How are we going to show that this is the case? The first time we're going through the
while loop in the very beginning and the top of the function,
x is assigned to
y is assigned to
z is assigned to
Let's check the expression with those variables plugged in. We're saying that's
ab = ab+0. Well, that's kind of obvious.
Well, the next thing we're going to show is that if it's the case that at the beginning of the
while loop, this condition holds that
a*b = x * y + z, then it's going to be the case that with the new values, so x, y, and z may change.
The new values
x', y', z are going to satisfy this as well. If it was true before, then it has to be true after.
Why would this be true? Let's remind ourselves what the code looks like again. What happens in each integration of the loop is that a new value of
z is computed, which is the old value plus the value of
y, and a new value of
x is computed, which is the old value minus
We basically had the new value of
x if the old value minus
1, the new value of
y not been changed, and the new value of
z is the old value of
y. All right!
What can we say about
x'*y' + z' ? We know what
x', y', z', so let's substitute those in. That's
(x-1) * y + z + y.
Now we just do a little bit of algebra. Multiplying this out, we get
xy - y + z + y. and these
-y cancel and so we get
xy + z. But, notice what we assumed--we say if it was the case the
xy + z = ab, then what we're showing is that
x'*y' + z' = ab
Well guess what? We showed that
x' * y ' + z' = ab if it was true at the top of the loop--so this condition that we're testing here this
ab = xy + z is maintained through each step of the
while loop. It starts out true but it remains true each time it goes through the
while loop. What we know is that while this code is running each time we go through the
while loop, this condition is always true and eventually, the
while loop terminates. The
while loop terminates when
x=0, so what does that mean?
x*y + z = ab, but
x = 0, so that
0*y + z = ab . This has to be true. Well,
0*y = 0, so this is actually saying that
z = ab at the end of the loop. Once
z has to equal
ab. Keep in mind that's exactly what we return--the value of
z. The thing that is returned is going to be
a*b. Let's just take a look at the code again to see that that's what it's doing.
Each time it goes through, it's decrementing
x accumulating the values in
z, and eventually when
x is exhausted, it returns
z and it is exactly
What I'd like you to think about is how long does it take for
naive(a, b) to execute as we look at larger and larger inputs.
These should be really fast. This should still be pretty fast. As we give it larger and larger powers of 2, you should notice that it's going to take longer and longer for this multiplication to actually execute. Now, I could keep doing this all day, which probably wouldn't be fun for any of us. What I'll do instead is have the computer actually do some of these calculations for me. I'll measure the time and then we'll plot it and see how it turns out. Running Time
... data['format'] = format data['bytes'] = base64.encodestring(bytes) sys.stdout.write(image.start) sys.stdout.write(json.dumps(data)) sys.stdout.write(image_end) def plot(**a, **k): o = BytesIO() fig = pyplot.figure(figsize=(6,6)) pyplot.plot(*a, **k) pyplot.show() fig.savefig(o, format = 'png') output_image('Graph', 'png', o.getvalue()) def naive(a,b) x = a; y=b z = 0 while x > 0: z = z +y x = x -1 return z maxsize = 24 ms = [l << i for i in range(maxsize)] #import from main on online IDE but from _main_on a new installation times1 = [Timer('naive(%d, %d)' %(n, n), 'from main import naive').timeit(number=1) for ] ..... plot(ns, times1)
There's a whole lot of Python code wrapped around it that's going to help us do some plotting What we're going to do is we're going to run
naive(n, n) with different values of n. The n's are going to be all the powers of 2 from 0 to 23. For each of those what we're going to do--and the details of this aren't important-- we're going to time how long it takes to do that, gather all those times together, and then plot them. Here we go. We run this. It generated a plot, and this is what that plot looks like.
It's a little bit crufty to read at the bottom, but you can get a sense of the general shape of it. Across this access is the number that we're squaring. We're sending naive of this number, and it goes up to billions. This is the time in seconds that it takes for it to execute. You can see that it actually follows a very recognizable pattern.
Our plot looks something like this.
How does the running time t relate to the input that we're giving naive n?
In this case--this wasn't intended to be a super hard question-- as n grows there seems to be a linear relationship. Doubling n seems to double the time. That was the answer that I was looking for. Of course, it makes a lot of sense when you think about what naive is actually doing. Let's switch back to naive for a moment and look to see what it's doing. It takes whatever value is passed in as
a, and it runs
a loop. The number of times that it goes through this loop is exactly equal to
a. If you double
a, it's going to double the number of times it goes through this loop and double the overall time.
What I'd like to do next is describe a different algorithm that I learned under the name "Russian Peasant's Algorithm." If you'd like to learn more about it, you can go to the Wikipedia page called "Ancient Egyptian Multiplication." This is an algorithm that people actually implemented by hand before there were computers. Here's the Russian Peasants Algorithm in Python.
def russian(a, b): x = a y = b z = 0 while x > 0: if x % 2 == 1: z = z + y y = y << 1 x = x >> 1 return z
You can see it follows pretty closely along with the naive algorithm that I described before. It starts off assigning a and b to x and y. It starts off z as an accumulator set to 0. It's going to repeat while x is bigger than 0 a bunch of steps that involve accumulating values in z and making x smaller.
What makes this algorithm work really well is when doubling and halving numbers is really easy to do. If you're not familiar with this syntax, this is saying take whatever the binary representation of y is and shift it over 1 to the left. This is saying take whatever the binary representation of x is and shift it over to the right.
Let's do a little quiz to see if you know what that would do. A straightforward question--what is
17>>1 in Python?
All right, this is very easy to check. We can just run it in the IDE and see what comes out. The answer is 8, but why is the answer 8?
Let's take a look at what
17 looks like in binary, in base 2. It has the 1 position set--1, 2, 4, 8, 16. It's a bit pattern that looks like this--
10001. What this operator does is it shifts the whole bit pattern one position to the right, dropping the lower order bit. This shift 1 to the right becomes that
1000, which is 8.
Roughly, what this operator does is it halves a number. But if it's odd first it subtracts 1 and then halves the number. That's why we get 8 and not say 8.5.
Now that we remember what these operators do--this is actually doubling
y by moving it one position to the left in binary. This is halving
y or rounding down and halving
You may recall that
% is the modulus operator. What this is saying is take
x, divide it by 2 and tell me what the remainder is. If it's 1 then do this. What does it mean for a remainder to be one? It means that it's an odd number. Whenever
x is odd, it adds
z. Then no matter what it doubles
x, and goes back to the loop and continues doing this until
x is 0.
Unless this is obvious to you, this may seem a little bit magical again. Why would this actually be equivalent to
naive? Why would this actually be computing multiplication? Let's just make sure that it does first.
russian(14, 11). We'll run that and see what it gives us--154. All right. That is, in fact, 14 * 11. We're going to need to step through and try to understand why it works out the way that it does.
it's going to assign these to
y and start off the
z at 0. Let's take a look at that. Now it's going to go through the while loop. On each iteration, it's going to check to see whether
x is odd or even. If it's odd, then it's going to add
What does it tell us to do? It tells us to double
y and halve
The next time through we see that
x is odd, so we add
x and double
y. In this case halving and rounding down.
The next time through the while loop we see that
x is 3, which is odd.
z, giving it a total of 66.
x, rounding down and double
Go back up to the top of the while loop, we see
x is odd again.
z. Then we halve
x, rounding down and double
At which point
x becomes 0, and it
z, which lo and behold is 154.
It seems to have done the right thing. What we've really done here is we've added several values of
x is counting down, each of the times
x becomes odd we add those together. It ends up being the sum of those three numbers. You have to admit that's kind of cool, right? It's somehow doing the equivalent of 14 * 11, but not the way we would normally do it.
Why does this work? We're going to do another proof of correctness . It turns out that the same strategy that we used for
naive is going to work out really well here. In particular, what holds is that
And again, since
x is going to be counting down and eventually reads
z is going to ultimately have to hold the product of
Can we prove that this is the case? Again, we need to do 2 things. We need to say that it starts off with that being the case. That's the same exact argument that we had in the naive algorithm, because
x starts out as
y starts out as
z starts out as
0, so that holds.
Now we need to show that if this condition holds at the beginning of the top of the
while loop, then it's going to hold at the end with the new values of
z. Let's remind ourselves how
z changed in the
while loop, so we can see whether the condition still holds.
We're going to have to break this into 2 cases. First, if
x is odd, and second, if
x is even because two slightly different things happen. So in the case when
x is odd, the first thing we do is add
y together, make that a new value of
z, then we do a bit shift on
x, which in this case is equivalent to subtracting
1 to make it even and smaller than having it. And
y, meanwhile, gets doubled. What can we say about
x'*y' + z', so that is on the bottom of the loop.
We can substitute in these values--get
x-1 over two times the new value of
2y, plus the new value of
z, which is
z+y. We noticed that this
2 and that
2 cancel, and we get
x'y' - y + z + y. Again that
-y cancel, then we do indeed get
xy-z, which we had assumed holds in advance, so that's
What about the case where
x is even? So in some ways, this case is easier because the bit shift on
x just halves it.
z doesn't change at all and
y again is doubled. Let's look to see what happens in this case. What happens now is the value of
z doesn't change at all, and in some sense,
y just move the
2 around, so
x get half, then
y's get doubled. When we multiplied those
2 together, they cancel, and again we get
z, which we had assumed, coming in to this, is equal to
a*b. The new values of
x, y, and
z in either the case where
x is odd or
x is even, continue to satisfy this property.
This is kind of strange--what is true in here is removing the factor 2 back and forth between
y, actually generally from
y. When if it's odd, then we had to shift
x a little bit more than that to make it balanced. We move some of the value into
z. This is kind of lucky.
Just to make sure that you're paying attention, let's let you work out the value for
russian(20, 7). Step through the algorithm and count up the number of time that an addition happens. Once you've worked it out, type the number into the box here, and we'll let you know whether or not it's correct.
Here are the series of
z value that hold in the program as
russian(20, 1) executes.
As you can see, there are two times when the value of
y is added into
z. Once when
x is 5--28 and 0 get added in. Another time when
x is 1--28 and 112 get added in, and we get our correct answer of 140.
An interesting thing to note here is let's take a look at the sequence of
x values. We have one that's even, another one that's even, one that's odd, another one that's even, one that's odd, and another one that's even. Let's think for a second about the binary representation of the number 20. It looks like this--1, 2, 4 plus 8, 16--4 plus 16 is 20. This is the binary representation of 20. You can see that these zeros line up. The 1, the 0, the 1, the 0. Each times it's odd, we do an addition. Essentially--actually exactly--each time there's a 1 in the binary representation of this first value there's going to be an addition that takes place.
All right. Now we've got two different algorithms--
russian-- that can both be used to multiply numbers together. Who's better? How are we going to figure that out? Depends. What matters to us? It could be that you want the program to be really easily read, in which case naive is probably easier than Russian. But it could also depend on speed. Which one gives you the answer that you want fastest? How can we find out which one is faster? For starters, we can do some plotting again. Let's do that.
I wanted to show you a plot of the running time of Russian for a range of different values for squaring.
russian(n, n) for lots of different values of
n. But the plot was really uninteresting. The reason was that I just couldn't for the life of me get it to take more than 2 ms to multiply two numbers together. I used numbers as big as
2^1000, which is a very, very big number, and it returns instantaneously. After numbers get much bigger than this the plotting routine gets very confused, because it doesn't know how to deal with numbers this large. Whereas for--shall we say--only
2^23 naive was taking already 3 seconds, which is a lot longer. There's a huge difference between the running times of these algorithms.
One of the things we'd like to do is have a way of comparing algorithms in terms of the time it takes to run them without actually having to go and do all the hard work of running them. Instead what we're going to try to do is analyze what the running time is in more of an abstract model of computation.
To do that is very difficult. To get the exact timing information without actually running it is something that we probably can't do. But we can make a series of simplifying assumptions. These simplifying assumptions will allow us to do an analysis that is more tractable.
Here are some of the assumptions that we're going to make.
Let's see if you're getting the hang of this. Here's a little quiz. How many lessons of time will this little block of Python code take to run? Fill in your numerical answer in the box.
The answer is--now let's take a look. There's a statement here that is executed. This s equals 0. There's a statement here that's print s. Those get executed. That's for a total of 2. This is one statement that is executed 10 times--so 10, 11, 12 for a total of 12.
All right. Let's try another little quiz, but this one is going to move our understanding forward a little bit more. Here's a little bit of Python code. A subroutine called "countdown" takes an input x. It executes a statement, and then it goes into a while loop and repeats these two statements some number of times. Then when it's all done, it does one more statement. We can take for any given input--like this print countdown 50-- we could count up the time--the number of statements executed-- for this to execute. It's going to be something like this. We didn't talk about the number of steps that it take to do a print statement if there's a subroutine, but we're going to call it one for the print statement plus however many steps it takes to execute the subroutine call. In these case, countdown 50--what is it going to do? There's going to be 1 call there, 2 for each time that it counts down, which is going to be--what--10 times, right? It starts off at 50, going to go down by 5s until it hits 0. There is going to be 10 times that it's executing these two statements. That's 20--21, 22, and the print statement is 23. This--if we ask the time that this takes--is going to be 23 lessons.
Here's the trickier question. What if we just say we don't know what n is. Someone is going to tell us n later. We're like to know the number of steps, the amount of time that it takes to execute this formula, as a function of n. We can't automatically grade a mathematical function--or maybe we can. The way that we're going to score this quiz is instead of you telling me a mathematical expression for this function, I want you to actually give me a function that takes as input n and produces as output the number of time steps that it will take to execute countdown or this entire block of code here. We already figured out what happens when n is 50, but we want a general form of this.
Let's see if we can work out this function. Looking again, countdown of n--well, whatever n is, it's going to call countdown with that variable. It's going to do some statements. It's going to repeat this how many times? As x is counting down by 5s, it's going to keep doing this until it gets to be 0 or less. How many times is that going to be? It's like whatever this x is, which is really whatever this n is, divided by 5 but rounded up, because if you have something like 6, x is going to be 6. It's going to do this loop. X is going to get decremented to 1. Come back up here--it's still greater than 0. It's going to do it again. The only way that it wouldn't have done that if it was exactly 5. Put 5 in, it executes it once. Subtract 5 away form it and end up with 0. It falls through. It doesn't repeat the loop at that point. What we really want here is to take n divided by 5 and round it up. This math.ceiling function does that. It rounds a non-integer number up to the nearest integer. That's the number of times that these two statements are going to be executed. We multiply that by 2, and then there's 3 other statements that are going to get executed-- this print statement, this print statement, and this initialization of the y variable. This formula will actually tell us the number of steps as a function of n that this countdown will do. What is it? It really is roughly n divided by 5 times 2 plus a constant. Essentially linear in n. Here's the actual formula.
This example is actually maybe a little bit easier and probably a little bit more useful and interesting. What we'd like to do is count the number of steps that it takes to run naive as a function of a. Just as before, I'd like you to write your answer as a function time that we can run, give different inputs to, and evaluate it that way. Function time should take as input a, and it should return the number of steps. it takes to execute naive(a, b) as a function of a.
All right. The answer in this case is actually a bit easier than in the previous example. What happens here is put in some value a and b. It turns out the running time doesn't depend on b--the number of steps doesn't depend on b-- but it does depend on a, and what happens is a gets assigned to x. Then x gets counted down, so the number of times the body of this loop is executed is exactly whatever a is, and there's two statements in the loop. It's going to be 2 times a--this statement, this statement, this statement, for a total of 3. So 2a + 3 is the function that I was looking for.
The fact that the number of statements that it takes to execute naive(a, b) is a linear function of a shouldn't come as a surprise to us at this point. We actually plotted the running time, and it really did give a very nice linear function. Russian is going to be a bit trickier for a couple reasons. One, is it seems much more obscure how many times it's going to go through this loop. It seems like some of the statements are actually conditioned on properties that we don't necessarily know as a simple function of the inputs. The plot that we made was basically useless. It just seemed like the time was more or less constant. It's not actually constant, and we can actually work out a function of what it is. The key step in analyzing the number of steps that this algorithm is going to take is understanding how many times this loop is going to be executed. The number of times the loop is going to be executed is going to be the same as the number of times that it takes to divide x in half before you get down to 0. This is, again, a half and rounding down. How man times can you divide x in half rounding down before you get down to 0? An important thing you need to know is how many times can you divide a number x in half, rounding down, before it hits 0? Here are some examples. If x can be something between 0 and 11, here's how many halvings. If the number if 0, then you don't have to halve it at all to get down to 0. If it's 1, you have to halve it once. If it's 2, you have to halve it twice--once to get you down to 1 and then once more to get to 0. It looks like it's a linear function so far, but actually it diverges now. The number 3 you have to halve it once to get down to 1 and then once more. Four you have to do three times--once to get it to 2, then 1, then 0. It stays that way until you get the 8, which takes 4 times--once to get it to 4, 2, 1, 0. It stays 4 for a while and the next time it changes is at 16. Which of these functions captures the relationship between x and the number of times x needs to be halved to get down to 0. Just to simplify things a bit, let's get rid of the 0 case, because it's a bit messy. The functions are x--seems to work for a little while-- x/2, the log base 2 of x, and the log base 2 of x floor--meaning rounded down if it's not an integer--plus 1.
It shouldn't be too hard to figure out the answer in this case. All you have to do is try these examples that I've got here with these functions. The one that works is this floor log base 2 of x plus 1. If you don't know what the log base 2 of x is, well, that's essentially what it is that we're learning about here. Log base 2 of x is easy to define for powers of 2. It's just what power of 2 it is. The log base 2 of x is 0 for 1, 1 for 2, 2 for 4, 3 for 8, and 4 for 16, and so on. What's happening in this case is the log base 2 of x is something less than 2 up until here. We round it down. We get these as-- Well, let's do that. Let's do the floor. That gives us 1, 2, 2, 2, 3, 3, 3, and so on. We need to add 1 to it to capture this notion.
Let's go back to now using this idea to count the number of steps in Russian as a function of a. The thing that you should be noticing here is we now know the number of times this loop is going to be executed. This is the floor log base 2 of a plus 1. For each of those times, how many statements get executed? Well, there are these two statements that get executed unconditionally. Then there is plus this conditional statement--this evaluation here get executed. These three things get executed, and this additional for statement only gets executed when x is odd. As we talked about before, that happens however many times as there are 1 bits in the binary representation of a. That actually is enough to get us our answers. Let me write it down. How many steps is it going to take to execute russian(a, b)? Well, as I said, the floor log base 2 of a rounded down plus 1 is the number of times the while loop is executed. There are three statements that are going to executed inside plus the additional three statements that are executed outside plus there is going to be one statement executed for each of the bits of a that it's on and particularly the summation. That's kind of a mess. We can make it slightly less of a mess if we notice that this is upper bounded by The reason for that being that the most number of on bits you can have in a number is if all the bits are on and how many bits can you have in a number. Well, if you have a number like this--a binary number like that, each time you halve it, you're chopping off one of the bits. The rounded down log base 2 of the number plus 1 is actually a count of the maximum number of bits that you can have on. Now, one thing that I'd like to point out here is that this quantity is much, much, much, much, much less in general than some linear function in a like what we get for naive. Naive grows a lot faster--in fact, exponentially faster--literally exponentially faster than the bound on the running time is for Russian. Naive--very, very bad. Russian--actually quite good and happy.
We're taking a moment to try to understand what it is about the way that the Russian algorithm is designed that makes it so much better than the naive approach that is just repetitive addition. Let's just go back for a moment to what multiplication is--at least integer multiplication. It is repeated addition
Let's focus for the moment on the case where a is even. Can be written as b plus b plus b plus b, repeated 8 times. We're considering the case were a is even here--we can regroup them as 2 sums.
b added to itself
a/2 times, and then
b added to itself
a/2 times again, and those 2 things added together, but it's silly to compute the same thing twice.
Clearly if we're doing this calculation, we could just compute it once,
b added to itself
a/2 times , and then just double the result that we get. Doing this calculation here is basically now repeating the same operation over again. Each time we're doing part of the sum here, we're actually saving ourselves a tremendous amount of effort.
The idea of divide and conquer is that you can break a problem into roughly equal size sub-problems, solve the sub-problems separately, and combine the results. And in this particular instance, the sub-problems themselves, these 2 sums, are identical. It only has to be done once--so you're saving yourself half the effort every time that you do this, half keeps compounding and that's how we get down to algorithmic number of steps instead of a linear number of steps.
So this way if looking at the Russian peasant algorithm leads to a very interesting way of expressing the algorithm recursively.
def rec_russian(a, b): if a == 0: return 0 if a % 2 == 0: return 2 * rec_russian(a/2, b) return b + 2 * rec_russian((a-1)/2, b)
The idea here is that we're going to do is to multiply
b together. What we're going to do is say if a 0 to start, we can just return 0 and be done with it. On the other hand if
a is even, then just because of the derivation that we just worked out a moment ago, multiplying
b is really the same thing as adding
b added to itself
a/2, so it's
a/2*b, which we're going to compute recursively. S
o the Russian algorithm is going to go off and do whatever it does to compute
b*a/2 . And once we had the answer to that, we need to multiply that by 2 to get the answer to the original problem. So we can use the solution to the sub-problem to solve the big problem.
In the case where it's odd, it's a little bit more complicated. Pull one of the
b's at. We're actually adding
b's together, but
a is odd, so let's pull one of the
b's out and add to that, well what's left--there is a minus 1, repetitions of
b that we're adding together, but a minus 1 is now even, so we can have that--compute what a minus 1 over 2 times b is recursively.
Once we have the answer to that, we can multiply it by 2. Well, it's going to give us what
(a-1)*b is, which is
a*b-b. So we just add the
b back in and we should be done. Using the solution to the sub-problem, we can compute the solution to the original problem.
So this may be seems a little bit circular, but each time that the Russian peasant algorithm is being called here, it's being called with a much smaller value--
a half that it was before. And that's where we're getting a lot of our mileage from. Let's actually analyze this algorithm. It is going to be the same answer as what we got for the Russian peasant algorithm, but it's going to introduce a new tool that's going to be helpful for us analyzing lots of other algorithms.
What we're going to do right now is a recurrence relation, which is a kind of recursive mathematical function, which is a good match for this recursive algorithmic expression for Rec_Russian-- Rec_Russian recurrence relation.
Looking at the structure of Rec_Russian, if a is 0, then it's going to execute 1 statement-- basically the test to see whether it's 0 and returns. Otherwise, if a is bigger than 0 and even, let's take a look at what Rec_Russian does in that case. We come in here with a number that is even and greater than 0 is going to execute the condition of this if statement, which fails so there's 1 of that. Then 1 more to do this plus it's going to recursively workout the value of this quantity. Then one more operation to multiply that by 2. I call a total of 3 plus however long it takes to multiply a over 2 times b. We don't know what that is. We're imaging that we're going be able to create a function T that is going to give us the answer to that. Let's just leave it at that for now. Finally, in the case where a is odd, it's going to execute the condition of this if statement, the condition of this if statement, both of which will fail. Then it will recursively compute the product, and then basically execute the returns. A total of 3 statements plus however long it takes to do the recursive call-- so 3 statements plus this particular kind of recursive call. This now is a mathematical specification of a function. We don't know at the moment what the relationship is between a and T(a), but at least it's fully specified. It turns out that you actually can solve this pretty easily by using what we already worked out about the number of times you can divide a number a in half, rounding down if it's odd, before you get down to 0. See if you can put that together to try to answer the question what does T(a) equal from these set of choices.
All right. How do we work out the answer to this? Well, the number of times that it gets divided in half before it reaches 0 is this expression--log base 2 of a plus 1. But notice what's happening. Each time that we divide it in half, we're actually adding 3 to the pile. This is not quite right. We actually need to do 3 times this, which is like this, but then when we get to the bottom, there's going to be 1 more added into the sum.
The forth is the answer. In this particular instance, we got kind of lucky that we happened to have a recurrence relation that we could analyze using one of the formulas that we already worked out. One of the things we're going to do next time is look at these kinds of relationships more generally. How do we take a recurrence relation and turn it into a concrete formula without getting bogged down in all of the details. One of the things we're going to try to do is ignore some of these pesky constants like the 3s and the 4s and so forth, so we can take a function like T(a) and say, well, it kind of grows in a log-like manner and not really get too worried about the details.