ud712 ยป



The Internet and the World Wide Web and intimately tied to everyday lives. It has put information that we seek at our finger tips. Remedies for the common cold, dinner recipes, sports updates, developments in the Middle East, you name it. You will find it on the internet. We know that the creation of such content happens from the basement of individuals like you and me, as well as businesses such as CNN, BBC, and NBC who's work it is to create such content. How is the content in the internet stored and distributed? In the first two lessons of this module, we looked at the server end of the giant scale services. How's a computational resources organized in the data center? How is the data organized internally in the computational. Nodes of the cluster. And what is the programming model for dealing with the big data associated with giant scale services and for exploiting all the computational resources that are available in the server forms and data centers. In this lesson, we'll look at content distribution networks. That is How is information organized, located, and distributed when we're looking for them? Once again, the issue is dealing with the scale of the problem. The content is generated worldwide and users are trying to access the content worldwide as well.


First a trivia quiz for you. Who started content distribution networks, and why? And I'm giving you a bunch of choices for who may have started the idea of content distribution networks including big players such as IBM, Napster, Facebook, Netflix, Akamai, Apple, Microsoft, Google. And, the why part of it, is it for content distribution of world news, e-commerce information, online education, like what we're doing right now, music sharing, photo and video sharing?


I'm sure the many trivia buffs around you will probably know that Napster was started, by teenagers. Who wanted to have a way of sharing music on the internet. That's how the CDN got started in the first place.


Lets understand what content distribution networks are. I'm planning a holiday trip to India fairly soon and let's say on the holiday trip I record some nice videos of some of the places that I went sightseeing. And let's say I call the content that I generated thusly. [INAUDIBLE] India trip. To keep it simple, let me call the node ID of my computer as 80. And I want to publish this video that I generated of my interesting visits to some of the sights and sounds of (no period) India on the Internet, so that anyone on the Internet can find it and download it. Now, how do I name the content of my video? Well, textual names may not be very meaningful, because there could be name collisions. So what I'm going to do is, I'm going to create a content hash of the video, and let's say my content. Hash has enough bits to ensure uniqueness of the hash string that I generated, I'll call that the key. Now I have key value pair the key is a bit string that is a hash of the content of what I want to store and make available to everybody, and the value is my node ID where the content is stored. Lets say that the content turned out to be 149, and my node ID is of course 80. So the key value pair that I've generated for my interesting video is 149, 80. 149 is the key, and 80 is the value. Which essentially is the node ID of my computer where I have the content. In other words the key 149, Is a unique name that I've generated for the content which I want to distribute to others that may be interested in looking at this particular content. Now if you get this key value pair that uniquely names my video it shows India trip and you can then come to my node and get it from me. Now the question is. Where to store this key value pair, so that anybody can discover this key value pair. If they are looking for Kishore's India trip, maybe I published the fact that Kishore's India trip unique name is 149 somewhere. And so now, if, anybody wants to find a way of locating the server from which they can download the content they need to get this key value pair. So, when I created this particular, key value pair, I have to find a way to place it in the internet. So that anybody can get these key value pair, and from that, know the node from which they can download the content. Now we cannot put this on a central name cell because it does not scale. Because user generated content is proliferating on the internet. So using a central server to store. All the key-value pairs is just not scalable. We need a distributed solution. And this is where the idea of DHT, or distributed hash table, comes into play. The idea is quite simple. Anyone who generates a key value pair has to find a location. Where they can store it so that intuitively, anybody that is looking for that particular key will be able to discover it. So if I want to store a key value there, what I'm going to do is find a node who's ID is exactly the same as the key itself. Or if not, close enough to the key that I want to store. So in this case, the key that I generated, which is a content hash of my India video, is 149. And looking on the internet, I find that there is a node whose node ID is 150. Close enough to 149, and therefore, I'm going to store this key value pair 149,80. In this particular node video, you know that the unique signature associated with that is 149, and you will know because of the structure of the DHT that the place to look for the signature 149, or the key 149, Is a node whose address is also equal to this key value or close enough. go to 150 and from 150 you will get this key value pair, and once you get this key value pair, you will know that all the content is stored in node 80 and then you come to me and get the content from me. This is how content distribution networks exploit the distributed hash table technology to store content on the internet. So that they can be discovered and disseminated to the users.

DHT Details

Let's dig into some of the details of the DHT technology. The first name-space that the DHT technology has to deal with is the key-space name-space. So the way name-space is managed is, if you have content to disseminate. You generate, a unique key, for the content, by using some algorithm like SHA1 which generates a they will not recollision even if different content are using the same algorithm to generate this key. So that is how you create a unique signature for a particular content, using SHA-1. The second name-space we have to deal with is a node-space. So here, what we're doing is, we are creating an SHA-1 hash of IP addresses of nodes that want to share content. So, let's say that me and my buddies. Form a social network and all our IP addresses, we going to use this algorithm to encode them into this 160-bit node id. So now we've got two name-spaces, one is the key-space, name-space and the node-space, name-space, and both of them are derived using the same algorithmic technique, let's say. So in this case we've created a key for the content so that there's a unique signature for a particular content, similarly, we've created a node id from the IP address, which is again a unique signature for a particular IP address. The objective is, if I have a key, I want to store. That key in a node id N, such that the key is very close to the node id N. Ideally, if you're lucky, the key is exactly equal to N, but, you know, it's not possible to guarantee that this hash and this hash will result in exactly the same value. So long as it is close enough, like in the previous example I showed you. That if I generated a hash 149, I stored it in that node 150, which is close enough to the hash that I created. So the API for manipulating this distibuted hash table data structure would be putkey and getkey, so putkey would take two arguments. The key and the value. Value can be anything that you want to associate with that. In the previous example, I've said the value may be the IP address of the content that is associated with the particular key. And getkey takes one argument, namely the key, and returns the value that is associated with that key-value pair.

CDN (An Overlay Network)

CDN is an example of what is called an overlay network. Let me explain what an overlay network is. In the previous example, I said my node id is 80. And the content corresponding to the key this is something the chip discovered, and once you've discovered it, you want to come to me to get the content from me. But how will you do that? What does 80 mean from the point of view of the internet? As you know, the physical infrastructure of the internet works with IP addresses. But what you have is 80. It's not an IP address. And the operating system only understands IP addresses, to route packets on the Internet from source to destination. So we need, at the user level, because we are doing this sharing of content at the user level, a way of mapping such virtual addresses, 80, 60, and so on, to IP addresses. So we, as friends, who have decided that well, we want to form a social network to share information content, and so forth, we've exchanged such mapping information and constructed a routing table at the user level. And this is what is called an overlay network, a virtual network on top of the physical network. So, for example, if you, using that SHA1 hash, found that your IP address maps to node ID 60, you'll say, okay, sure, my node ID is 60. And my IP address is such and so that is how I'm going to construct a user-level routing table. All I know is your node ID, and that node ID maps to some IP address. And I can use this correspondence between the node ID and the IP address, to have a way of sending information to you. And maybe, a friend of yours has exchanged routing information with you, and he has told you that his node ID's 80. And he has also given you how to reach him. Given that node ID. And you've shared that information with me as well saying that you know, I have buddies, and these are all the note IDs of my buddies. So that's how I construct this routing table, which is really a table that consists node IDs of my friends and friends of friends, friends of friends of friends, and so on. That's how I construct this user level routing table. Now if i wanted to send a message to my buddy, let's say B, and his node ID is 60. Because B has given me his IP address, when I send a message to node ID, I can covert that to the IP address, give it to the physical network, and it goes to the physical network gets delivered to B. What if I want to send a message to some other node C. I know the node id of c because my buddy b exchanged that with me. But I have no idea what the IP address of c is. But on the other hand, I do know that b has a way of getting to c. And so in my routing table, what I'm going to say is that given a name of a particular node, I know the virtualized node ID associated with that node, 64B, 84C, and I also know what the next hop is in the user space, in the virtual space of getting to that particular destination. In the case of B. I know I can directly send it to the other hand, if I want to send it to C, I have the node ID of C, but I have no idea what that maps to in terms of IP address. I know that my buddy B has a way of getting it to C, and that's what I'm going to use. I'm going to say, if I want to send something to C, I simply hand it over to my buddy who's at node id 60. So the routing table says, given a node id, who's the next hop I should give it to so that it'll eventually get there? Now the physical network, of course, is much more elaborate than this user level, overlay network. Now, for instance, let's say I want to send a message to C. His node ID is 80, but I have no idea how to send it to him except to know that if I give it to my buddy B, who's at node ID 60, he'll know how to get it over to, to C. So, I'm going to send it to B, and when it comes to B, B knows. From his routing table, the IP address for C, and therefore he will send it to C and it'll eventually reach C. So, at the user level, you can see that so far as I'm concerned, if my node ID is 50, and my node name is A, when I wanted to send a message to C, what I did, was to send it to B. And it took two hops at the user level. It took two hops to go from A to C, went to B my buddy and B then sent it over to C. So this is a user level traversal of the message. But in reality, under the covers, what is going on, is when I send this message from A to B, it is going through the physical network and reaching B. And when I send to message to A to C. Even though it is two hops over here, internally in terms of the number of network hops that may have to be incurred in the physical network for the message to reach eventually the destination C, it might take many more hops. That's what an overlay network is. A virtual network on top of the physical network. And this particular overlay is a content distribution network, because it allows content to be shared and distributed among a set of users, who have exchanged information with one another so that they can discover one another, if not directly, indirectly through friends of friends.

Overlay Networks in General

Overlay Networks is a general principle and at the OS level we already have an overlay network. There is the IP Network which is really an overlay on top of the local area network. You may have a particular IP address for your laptop. That IP address actually translates to a Mac address. Because the Mac address is the real address that is used by the physical network in the local area network to communicate with other buddies on the same local area network. So at the OS level, when you send a message using TCP/IP, that message, it will be delivered to your buddy who happens to be on the same local area network. It is actually getting converted to a Mac address so that it can traverse the local area network and get to the desired destination. So that it can traverse the local area network and get to the desired destination. Similarily, CDN, a content distribution network is an overlay on top of TCP/IP. So in particular, In the case of CDN, we have this node ID. There is some manufactured ID at the application level, but that maps to an IP address, and that is how at the application level, when I say I want to send a message to Node number 60, at the application level, I can convert that Node ID 60 to an IP address. To which I can send it and eventually the message will get delivered.

DHT and CDNs

DHT or distributed hash table, is an implementation vehicle for a content distribution network to populate the routing table at the user level. As I mentioned earlier, for placement you need a put operation. And the put operation is going to take 2 parameters, a key, and a value, and the key is some content hash that uniquely identifies something that the user community may be interested in finding. And value is the node ID where the content is stored. And retrieval of a key value is done using a get operation. And when you do a get and give the key, what you're expecting to get back is the value that is associated with this key value pair. So, what you get back is a value that is associated with this key, which was placed somewhere using the put operation. So using the put and get operation, let's see how to construct the routing table as new content gets generated. That is, you want to do placement of key value pairs using put operations, and retrieval of a value associated with a key, with this construction.

Traditional Approach

The traditional approach, which I'll call the greedy approach, in constructing distributed hash table is when you want to place a key value, you pick a node, n, where n is very close to key. And now, if you want to retrieve a given key, K, the algorithm you know is going to be, you want to go to a node N, which is closest to this key K, because that is the algorithm that is being used for placing, so for retrieval you just do the reverse. When you want a key K, you go to a node which is closest to key K. That's how these routing tables get populated at different nodes in the distributed system at the user level. So the routing table at A says that these are the known peers to me whose node IDs I know and I know their mapping of the node ID to the IP addresses. Now the node space may be much bigger than the number entries I have in my routing table. So what do I do if I want to communicate with a node whose node ID I know, but I don't have a mapping for that node ID with respect to the IP address. Basically the routing table at every node is just saying these are the nodes that I know how to communicate with directly. That is all the nodes that are reachable from node A. Because, at the user level, I have a mapping between the virtual node ID that is used in the DHT and the IP address that corresponds to it. And remember that IP address is the only thing that the operating system is going to understand. And therefore I know how to communicate with node node 60 when I want to send a message to node to communicate with 60, knows how to communicate with 109. And if these are the only entries that are in the routing tables of A and B, these are the ones to which node A knows how to communicate directly. What if I want to go to some other node that is not in my routing my table yet? For example let's say that I am trying to retrieve a key, 58 or 59. I know that 58 or 59, in terms of the DST construction, it's most likely stored in some node whose ID is very close to this key. Now in my table I have a node ID 60, close enough to the key that I'm looking for. So what I'm going to hope is that if I am looking for this particular key, 58 or 59, good chance that the key value pair that corresponds to 58 or 59 is stored in this node ID 60. So that's the one that I'm going to communicate with. It's possible that 58 is actually stored in a node ID 58, in which case my hope is the desired destination that I want to reach to is known to this peer, if I want to communicate with node number 58, my best bet is to communicate with node number 60 with the hope that 60 may actually know how to communicate with node number 58, because ultimately I'm hoping that that's where this particular key value pair may be actually stored. On the other hand, if the key that I am looking for is 80 or 81, then I'll say well, chances are this key is toward a node ID 79 for whom I have a mapping. Or if I go to him, he might know how to get to node number 80 which may be actually storing this key 80, which may be actually storing this key 80. So in other words, in the greedy approach what we're going to do is in placing a key value we're going to place the key value pair at a node N where N is equal to K, ideally, or close to K. Similarly, if I want to retrieve a key K, I'm going to go to a node N, where N is either equal to K if I know how to get to it from my routing table. Or get to a node that is close enough to the desired N. In this case, the desired N is 58 or 59, but the one that I can get to is 60, and my hope is that when I get to 60 he will know how to get me to node number 58 or 59. Or even better, he may be the one that is storing this particular key value pair that corresponds to the key that I'm looking for. So in other words, in the greedy approach we are trying to get to our desired destination as quickly as possible with the minimum number of hops to get to the desired destination. And when I say the number of hops, it is at the level of the virtual overlay network, not in terms of the physical network. Because at the user level have no idea how many hops my message may actually take going from source to destination. All that we are saying is at the user level we are trying to minimize the number of hops to get to the desired destination.

Greedy Approach Leads to Metadata Server Overload

The greedy approach leads to what is called a meta-server overload. Let's understand how that happens. Let's say that there's a whole bunch of users that are generating content, and one guy generates a content with a key of 149, another guy Generates a content with a key 148, 152, 153 and so on and now what they want to do is they want to place this key value pair this guy's node ID is something else and all these guys want to keep the key value pair and what will they do in terms of this greedy algorithm well they'll try to find a node whose id is closest to the key that they want to place. Well it turns out that 150 is the closest. ID corresponding to all of these keys that we're talking about here, so all of these puts will result in going to one node, namely the node ID 150, in terms of the put operation. And, first of all there's going to be congestion here if a lot of. Content hashed to a key that are so closely together they all end up in the same node ID 150. Remember that the actual content is with the putter of this key value pair, namely node 80. And, similarly, the content Is with this putter, content with this putter. What this guy is storing is the metadata that allows everybody to discover the content provider. So not only is this node going to find a lot of traffic if lots of keys map to its node ID, also the nodes that are adjacent to this node In the overlay network, they are going to be affected also. Everybody is trying to reach this guy so you can see that as we go towards this, there's sort of a presaturation effect that's going on. The tree is rooted at this destination node, and the nodes that are in close proximity, the intermediate nodes in the overlay network space They get congested, and so on. And that is what is sometimes referred to as a desaturation problem. Now this is for the putters. Further, if my content whose key value is toward here, the video that I generated from my India trip. If that gets hot, and everybody wants to Find out how to achieve that particular video 149. They're all going to make get calls. And these get calls again end up with this same metadata server, because that's the guy that is storing my key value pair 149, 80. And so all these gets saying that I want to get 149, I want to get 149, I want to get 149. All of them. Go to the same node ID 150. Again there is congestion at the meta data so it can happen either because some content is so popular that everybody that wants to discover the content provider they have only the key and they have to go to the metadata server to find out the content provider So that results in this congestion, which is the metadata server overload problem. And the combination of these puts and gets result in what I called the tree saturation problem, the tree being rooted at the congested node and affecting all the nearby nodes In the overlay network because they are the gateway to get this node that contains this particular key value pair.

Origin Server Overload

But the problem doesn't stop there. Metadata is overloaded, but my video has gotten popular. And so all of these guys want to download the content from me, which means that everybody is coming to me, and I have an itsy bitsy server in my basement. And that is going to get inundated with all of these download requests from all of these different clients that want to get this video from me. So, there is a metadata overload and there is a content overload that is coming about because of the fact that something became really hot and everybody wants to get that content. And this is what is called the origin server overload. In general, there are two solutions to the origin server overload problem. The first solution is to have a web proxy. And you might be familiar with this already because every organization tends to have a web proxy so that they can limit the amount of requests that have to go out of an organization. And that way if users of an organization are trying to access popular content from the Internet, the local web proxy that's available in the organization can directly serve that content to the requesters. But it turns out that the web proxy solution is not good enough for what is called a Slashdot effect. And that is, if there is a breaking news and everyone wants to get the content, they will have to get the fresh content. A proxy's not good enough for that. Let's say that there's a Super Bowl going on and everybody wants to watch the Super Bowl, or at least get the latest updates from that. In that case, the web proxy's not going to be good, because the content in the web proxy is cached content. You want the live content. And the live content is only available at the origin server, not at the proxies, and therefore the origin server will get overloaded when we have such dynamic content. Content that is resulting from either breaking news, or live programming, and so on. And this is where the content distribution networks come into play. The idea is that the content is automatically mirrored from the origin server at selected geographical locations. And those locations are constantly getting updated from the origin server. So that going to any of the mirrored content is the same as going to the origin server. And then what happens is that, in the content distribution network, depending on the geographical area from which a particular user request originates, the user request is dynamically re-routed to the geo-local mirror of the content so that the origin server need not be overloaded. You may be familiar with companies like Akamai which have come about for the purposes of providing content distribution network for popular content providers, like CNN or broadcast television channels like CBS, NBC, and so on. And what these organizations, that is the content providers, do is get into an agreement with a content distribution network provider like Akamai, so that they don't have to worry about origin server overload because the content distribution network provider like Akamai have the solution for taking the content of the origin and automatically mirroring it in geo-local sites so that the origin server never gets overloaded. So, this is good if you are a big organization like CNN, or CBS. You can afford to pay the content distribution network provider to do this automatic mirroring, but how can we democratize content distribution. How am I, in my basement, generating a piece of video and I want to share it with the world, how am I going to make that available to everybody? If it gets hot, I don't want to get overloaded. I want the content distribution network to work without having to pay big bucks to a content distribution network provider like Akamai. And this is where the Coral System comes in. This is a paper that I've assigned you to read which has a technique for democratizing content distribution. And the details of the Coral System, and what we're going to look at next. So the Coral System addresses two issues. The first issue is the fact that if I generate some content, and I want to store the key value pair associated with the content in a DHT, I have to have a scalable way of doing it without saturating any particular node which can serve as a metadata server. So we want to avoid the tree-saturation effect and the second thing we want to do is also avoid the origin server overload. So both of those things are being addressed in providing a democratic solution for content distribution for the average Joe that may want to generate some content, and want to share it with the rest of the world. We will look at the details of the Coral System in the rest of this lecture.

Greedy Approach Leads to Tree Saturation

We mentioned that the greedy approach of constructing a DHT leads to tree saturation. And the congestion happens at the node, which happens to map to a lot of clustered keys. The coral approach is very simple, don't be greedy. What does that mean well what it mean the greedy approach,when we have a key K, we try to store it in the lower N who's note i.d is equal to K the coral approach use this as a hint not and absolute. But you still have to have a method to the madness, if you're going to store it in some place different from N. Then, those who are trying to discover it, have to have a method to the madness, of where you stored your key. We'll see how that is done. So the top level bit I want you to take away is that in the Coral DHT The get and put operations are satisfied by nodes that are different from the key K. The node IDN may not be close to key, and that's why the DHT that Coral implements is called a sloppy DHT, and we'll talk about details of that in a minute. The rationale of course, is you want to avoid tree saturation, that comes about when lot of keys map to a particular node ID. And also in the process, what we want to do is, spread the metadata overload so that no single node in this democratic process of helping one another is saturated or overloaded by being a good citizen. How does Coral do it? It has a novel key-based routing algorithm, which we'll describe, but the basis for that key-based routing algorithm is to compute the distance between, the source of a get or a put operation and the destination for that get or put operation. The destination by default is going to be the node whose ID is equal to the key that you're looking for. So the distance between the source and the destination is X or distance, meaning [UNKNOWN] or the bit pattern Of the node ID for the source and the bit pattern for the node ID for the destination that X or of the two qualities give you the distance between the two nodes in the overlay network space. And the reason why you want to do an XOR is because an XOR operation as opposed to, say, doing a subtraction, is going to be much faster. We do an XOR to find out the distance between the source and the destination that need to communicate. So, for instance, if the source at, this is fourteen, and the destination address is four. Remember that these are null IDs in the null ID space which at the level of the user. And if you do an X sort of that you get 10, so and,and the destination for. The bigger the XOR value, the larger the distance between the source and the destination in the application namespace. And since we are dealing with fairly big numbers here, the node ID could be 160 bit quantity. And that is the reason we want to use some simple operation that will get us the distance. X R is a very quick operation to implement to get the distance between source and destination.

Key Based Routing

Let's discuss Key Based Routing. First, we'll talk about the greedy approach. What I'm showing you here is the state of the routing table at the source node, which is fourteen. And the key that I'm looking for, let's say is four, and therefore in the greedy approach, I would assume that The key K, that is K being four, will be stored at a destination whose node ID is four. And what this table is showing, is the routing table at the source, and in particular, the entry that you have here, where you have a valid entry, those are the nodes that are reachable from. Node fourteen. So from the source node fourteen I know how to reach node number thirteen, I know how to reach node number three, I know how to reach node number two, zero and five. I don't know yet how to reach node number four because that is not in my routing table at this point of time. The intuition in the greedy approach is if I'm looking for a particular key, K, in this case, K is 4, then I know that the node that is likely to have that key, K, in this case 4, is going to be the node whose ID is also 4 and so this is my desired destination, but what I'm going to do in the greedy approach is get as close to the desired destination in the node_id namespace. If I look at the state of my routing table, I don't know how to get to 4, which is my desired destination, but I know how to get to 5 which is close to The desired destination in the name space of the node ID's, so what I am going to do, is ask this guy, because I know how to reach him, and he is close to my desired destination, do you have a way of getting to destination number 4? And my hope is that he may know the route to get me to the desired destination, and I'll be done. And getting the key value pair that I'm looking for. That's the idea in greedy routing. Take the minimum number of hops to get to the desired destination by using information that is available in my user level routing table, that has information about which are the nodes that are reachable from me directly. So the objective in the greedy approach to routing is reaching the destination with the fewest number of hops. That's the key. So in other words I am optimizing my own look up. This "me first" approach we know can lead to congestion and particularly the tree saturation that I mentioned earlier.

Coral Key Based Routing

So the Coral key-based routing takes a different approach. In the Coral key-based routing, rather than being greedy, we are going to slowly progress towards our desired destination. In particular, I mentioned that the distance between two nodes in the node namespace is given by the XOR of the node IDs. So, in particular, if I look at the source 14 here, and the destination four here, we can compute the XOR distance between the source and the desired destination. And the routing table now is populated with numbers. And what these numbers show is the XOR distance from any particular node to the desired destination. So, the XOR distance from my source, which is 14, to the destination, which is four, is ten. The XOR distance from the node whose ID is 13 and the destination is four is nine. And similarly, the XOR distance from the node whose ID is five and the destination four, is one. So what I have now in my routing table is the XOR distance of the desired destination. That's what I have in this entry of this routing table that says what is the XOR distance for my desired destination right now from each of the guys that I know how to get to directly? I know how to get to 13. I know how to get to three, and two, and zero, and five. And what I am looking at now is if I get to destination? What is the distance of this guy from the desired destination and so on? That's what these table entries are showing right now. These are the nodes that are directly reachable from me. And what they showing is the XOR distance of each of the nodes that are directly reachable from me to the desired destination. So in Coral, what we are going to do is in each hop I'm going to go to some node that is half the distance to the destination in the node ID namespace. Recall in the greedy approach since I have a way of getting to node number five, I directly went to him with the hope that he'll get me to my desired destination. Not so in Coral key-based routing. What we're going to do is in each hop we're going to go to some node that is half the distance to the destination in the node ID namespace. Now the XOR distance between the source and the destination, 14 and 4, is ten. So in the first hop I'll go to the node that is half the distance to my desired destination, desired destination being ten. I want to go to a node, number five. Second hop, I want to go to a node that is half the distance of five. That is two distant from the desired destination. And third hop I want to go to a node that is one distant. That is, finally I am home. So that's the idea behind the Coral key-based routing. But of course, when I have a particular distance metric, like ten in this case, and I want to go half the distance in the first hop I may not have a direct way of reaching the guy who is half the distance to the desired node. So let's see how actually Coral key-based routing works, given this particular example.

Key Based Routing in Coral

So we look at the evolution of the routing table of this source using this coral approach to key-based routing. Where in every step, we are going the half distance toward the desired destination. Now reducing the distance by exactly half, may not always be possible because I may not have a way to reach that particular node. So it is approximately we reducing the distance by half, so lets run through this example to illustrate, how the there key based routers works? So the XOR distance between the source and the destination is is 10, so target for my first half is going to be,to a node that is five distant from the desired destination. The node that is five distant from the desired destination is node number one because the XOR of one and four is five. So this is my target that is half the distance to my desired destination. But unfortunately I don't have a direct way. Of reaching one because I don't have that entry in my routing table and therefore, I'm going to go to a node that is approximately half the distance and I could have gone to either two or zero, close to the desired half the distance metric, but to make forward progress towards the desired destination, I'll go to this guy who is Four distant from the desired destination. So I make a iiiicall to this node because I can not reach him. I go to him, and I'm going to ask him. Hey I am looking for someone who is two distant from my desired destination which is four. And when I send my request over to him He responds to me and says, look I have information on three nodes that are not exactly too distant but close enough, four, five and seven are the nodes that I have. Who are close enough to the two distant neighbor that you're looking for. The two distant neighbor is this guy. But even this node that I'm reaching that is four distant from my desired destination, he doesn't know anyone that is two distant from the destination, and so he is going to respond to me and say. I don't have exactly what you're looking for, but I have information about nodes four, five, and seven that are close enough to who you're looking for. That's the information that I get back. So when i get the response from this node, what I'm getting back is information on how to reach nodes number four, five, and seven. So I'm now going to populate my routing table to evolve it to this new state where it shows that in addition to what I started with originally, I have now new information about how to reach node number four Node number five, which I already had, and node number seven, which is the new information I got. So you can see that from this, I evolved to this by adding two more reachable entries in my routing table, namely nodes four and seven. So what do I want to do next? Well, the next thing that I want to do is, I want to go to someone who's too distant from the desired destination because I reduced the distance by half, and now I want to reduce it even further by half. That means I want to get to a node that is too distant from the desired destination, that is this guy right here. But I don't have a way of reaching him. So I'm going to go to a node that is close enough to the desired target. The target is too distant from. The desired destination. I don't have an entry for him. I could either go to the guy that is three distant from the desired destinatnion, or that is one distant from the desired destination. Obiviously I want to go closer to the destination so I'm going to go to this guy. Normally if I went to the guy who is Too distant from the destination I would have asked him for. I'm looking for someone who is one distant from the desired destination, but in this case, it turns out that I've already reached the guy that is one distant, because I did not have an entry for the guy who is two distant from the desired destination. So this guy is going to get my request that says I'm looking for somebody who is one distant from the desired destination and he looks at his [UNKNOWN] table and says, these are the nodes that I know, satisfy the criteria that you're looking for, 4, ,5, and 7. That's what I get back. I know that one thing is happening here, and that is - After my first hop, I got back node IDs for four, five, and seven, which are the response for my requests saying I'm looking for somebody who's too distant from the desired destination. Because it gave approximately the nodes that are having the characteristic of being too distant from the desired destination, including. The node number four itself. So my table evolved at this step with a direct way of reaching node number four as well. But notice that I'm not being greedy here. Because in the second step, I want to reduce my distance only by half from the first step, and that is. I want to go to somebody who's too distant from the design destination. That's what I did. And I got back this response. And when I get back the response, I'm going to populate my table again. But in turns out, I did not get any new information at this step because I only have the information about four, five and seven. At the end of my first RPC call as well. Now I have a way of reading the [INAUDIBLE] and in my conscious I can feel good that I'm not being greedy I can now the target is zero and now I can make the call directly To the desired destination and get back the response that I want. So you can see that in this coral's key-based routing, using this idea of reducing my distance to my desired destination by half at every harp. What it results in is the fact that the latency that I am going to experience in order to reach my desired destination is increased, because I have to take more number of hops in order to get to the desired destination. Even if I have a direct way of reaching my desired destination, I am not doing that. I am actually reducing my distance by half every time. In order to get to my desired destination and the reason is I'm placing common good more important than my own latency. I'm avoiding the tree saturation that can occur at the destination if everybody is greedy. That comes at the cost of increased latency But that's okay. What we are shooting for is common good.

Coral Sloppy DHT

Now let's discuss the primitives that are available in the coral for manipulating the floppy DHT in particular the put and get operation. The primitives are exactly the same, it's just that the semantics of the put and get are very different in terms of how it is actually implemented. So put takes on two parameters Key and value. Key is the content hash, and value is the node ID of the proxy with the content for the particular key. Essentially, this put is announcing the willingness of the proxy to serve the content whose signature is key. The put can be initiated by the origin server with the new content, or it could also be initiated by a node that just downloaded the content and wants to serve as a proxy to reduce the load of the origin server. In both cases they will want to do a put operation. And the result of doing the put operation is to store this key value, in some metadata server. That is some node that is going to serve as a name server that can, answer queries coming in saying, I am looking for this key. In that case, that metadata server can return the value, associated with that key. So what we need to do when we do a put operation, is to place this key value in an appropriate node. Now, what do we mean by an appropriate node? Ideally, what we want to do is, given a key, we want to store it in a node ID whose ID is N equal to key. That's the desired node where we want to place it, but we want to do this without really causing a metadata server overload. Now how do we determine if a particular node is overloaded? Well, what we're going to do is define two states. One state is called a full state, and what that is saying is a particular node, lets say node n, is already storing l values for a key. Remember what I said earlier, this key value pair can be placed by either the origin server that is creating the content. Or it could be placed by a proxy who is saying, I'm willing to serve as a proxy for the content. So there could be lots of nodes that have the content, and are willing to serve the content. All of them would have done put operations. So the node that matches exactly the key. That, is being put. They already have quite a few candidates, that are willing to serve as the content providers for that particular key. That's this parameter full that's saying, I'm willing to host up to L entries, for this key value pair. Anything more than that, I'm not going to do that. I'll get overloaded. So the full is a condition, you could say it's a special condition that's saying, I'm willing to entertain up to l content providers for a particular key k. The second way a particular node may get overloaded is, if it actually starts getting a large number of requests for a particular key. So, this is a time metric that says a node has a beta parameter, and the beta parameter is the number of requests per unit time. That a node is entertaining. So if it says that I'm already entertaining beta requests for this particular key and therefore, if you want me to store the same key, I'm going to say no, I cannot do it because I'll get overloaded. So this is a space metric that's saying, how many values I'm willing to store for a particular key. Loaded is stating how many requests per unit time I'm willing to entertain for a particular key. Those are the two metrics we're going to use in determining whether to place a key value pair at a particular destination node.

Coral Sloppy DHT (cont)

So let' say I'm a proxy, and I want to put a key value pair in the coral DHT. I'm going to use the key-based routing algorithm to place the key at an appropriate node. What does that mean? Well, I'm going to take the key, and even if I have the node ID that is equal to this key, I'm not going to go to him directly. Remember that, the Coral, key based routing algorithm reducing the distance by half. So I'm going to go to a node that is half the distance to the desired destination. So the desired destination is n. I'll go to a node that is half the distance, say n over two, n over four, and so on, til I get to the desired destination. And as I'm going, making these calls, saying that, well I'm progressing towards this desired destination node, using that key base routing algorithm that uses the XR distance between the source of the destination halving the distance at every step. What I'm going to ask is, are you loaded or full? These two states that we talked about. And this guy says, no I'm not loaded or full, and here is the next hop you can go to. So I keep going for, forward. And if none of these guys say that they are loaded or full, I would eventually reach my desired destination and place the key value over there. However, when I do this hop, going from hazardous to one fourth of this and so on, somebody along the way may say that, look for this particular key that you're trying to place. I'm already full, all loaded, one of those two conditions is already applying to me. So this node responds back to me and says, I am either loaded, or full. If that is the response I get back, then what I'm going to infer from the response is that the rest of the distance going toward the destination is all clogged up, because of the tree saturation. And therefore, I'm not going to even try to place the key value pair at the destination, at the desire destination. And not even at this guy because he's also loaded and or full for the same key. And so I'm going to retract my step and choose this node as the node to place the key value pair. So in other words, when I do a put operation, there are two phases to it. The first phase is the forward phase. In the forward phase, what we are doing is, we are going to the guy that is half the distance to the desired destination asking him are you full or loaded. He says, no, I am not. Then you go to the next guy who is even closer to the desired destination. He says, he's not full or loaded. Then you go to the next guy, who is even closer. Using that, key based routing algorithm that I described to you earlier. So keep going till you hit a node, that is either loaded or full. At that point you know, you don't want to go any further towards the desired destination because all of these guys are either loaded or full, dealing with this particular key. And therefore, the second phase of the algorithm that I'm going to use, is retract my steps and go back, and ensure that, that this guy is still willing to host my key value pair. Why would he change his mind? Well, between the time that that I am making this forward motion it is possible that this guy got either full or loaded. So I'll recheck the condition. It says still I'm good to host your key value pair, I'll choose this node for the Put operation. That's how the Coral Put operation works. So you can see that we are not storing the key in the desired destination, which should have been the way a greedy algorithm would have worked. But in the sloppy algorithm of Coral, we choose an appropriate node that is neither full nor loaded, so that it can entertain requests for retrieving this particular key value pair. So we've avoided the meta server overload by doing this key-based routing in the forward path during the put operation. So the get operation is going to do exactly similarly. That is given a key that I'm looking for, I'm not going to go directly to the destination that might be hosting it, as would happen in the greedy approach. But instead, what I would do is go to a node that it is half the distance to the key I'm looking for. And when I do that, my hope is I'll find the key somewhere along the way. Because some guy may be serving as the meta data server for that particular key. If not, I will go, to the, destination. If nobody has retrieved that key before, it will be available at the desired destination. I'll get it from there. But the hope is that, if, a content is popular enough, then, multiple people, multiple proxies may have gotten the key value pair. And therefore, and they may have gotten the key value pair, and in turn when they have gotten the content as well, they will have put their own node IDs as a potential node for the content. And so, our metadata server, when we are looking for a particular key may not necessarily have to be the destination which exactly matches that key.

Coral in Action

So now let's see Coral in action for putting and getting content, user-generated content, that can be distributed in a democratic fashion and the load for both serving as a metadata server as well as the content server can get naturally distributed. Because of the way the Coral system works in managing the sloppy DHT. Let's say, Naomi, who is at node number 30, has some interesting content, and she wants to share it with the world. So what she does, she creates a unique signature, a key, for this content by hashing it. And let's say that the key that she generated for this content is 100. So now, Naomi wants to put the key, 100, and the value, 30, indicating that this node has the content corresponding to this key, 100. She wants to put it out on the internet, and so she uses the Coral system. And she uses the Coral key-based routing. The node that she would like to store this key 100 is node computer has node ID 100 so that's the place I would like to keep it, but we are following the Corals key-based routing algorithm. So Naomi, what she's going to do is going to make a series of RPC calls to put 100, full or loaded. And finally, she reaches David's computer. David also says my node is neither full nor loaded. How can it be, because she just created this content, 100. So this metadata server for this particular key. So David is the right place to keep it, so David hosts this particular key value pair, 100, 30. Jacques finds out that there is this interesting video whose signature is 100, so he wants to get it. And he knows that the likely place where it is contained is node number 100, but once again, he is going to use the Coral key-based routing, and he is doing a get call, and the get call follows the same key-based routing algorithm of halving the distance to the destination. And so we make a whole bunch of RPC calls, finally get to the destination itself because none of the intermediate nodes have this key value pair. So we get to David's computer, and David says, yes, I do have the key value pair 100, 30, and here is the value that you are looking for associated with the key that you are asking about, and the value is 30 indicating that 30 is the node that has the content that corresponds to this key 100. That's what Jacques is going to get back. So then Jacques gets his response from, from David that the value is 30, that value indicates the node ID from which Jacques can download the content corresponding to the key 100. That's Naomi's computer. So, Jacques goes to Naomi's computer and gets the content corresponding to key 100. Naomi sends the content, so Jacques is now happy. He's got the content that corresponds to 100. But Jacques is a nice guy too. So he says well, I have the content. Since I have the content, I can also serve as a proxy for Naomi. And what I'm going to do, is I'm going to put the key value pair 100 corresponding to this content that is now mirrored over here. And say that the value is 60 indicating that I'm willing to serve as a proxy for the same content. So I'm going to do a pull operation, and this pull operation is going to go down, and this pull operation is going to use the same key-based routing algorithm. And when it gets to David, David might say look I am not interested in holding more than one value for this particular key. And so if he says that I don't want to do it then I have to retract my steps and pick an intermediate node which said that it is willing. Because it is neither loaded nor full for this particular key 100, so it's willing to serve as a metadata server. We already have one metadata server, but this guy is willing to do that for only one value, and therefore this guy becomes a new metadata server for the same key 100. So in this metadata server, new metadata server, we've got this entry 100, 60 also stored. So now there are two metadata servers that can potentially answer queries that concern this video 100. Now if a third guy, Kamal, comes to know about this cool video that is now propagating on the internet and he finds that the key for that is 100. He can once again query the Coral system for that video and he is following the same key-based routing algorithm of Coral, and trying to get towards David's node, which is node 100. So he's going to follow that, but when he does that he hits this intermediate node and this guy says, you know what, I've got the key that you're looking for and the value that is associated with this key is 60. So Kamal doesn't have to go all the way to this metadata server. He can get the answer for his query, get value. Different from Naomi's address. Namely node ID 60 that corresponds to the new good Samaritan, Jacques, who's also willing to serve as a proxy for the same video content. So 60 gets return to Kamal and Kamal can then go to Jacques and get the content from, from Jacques. And that way you see that the origin of this particular video, which started at Naomi, is now propagated to Jacques. So the origin server need not get overloaded and of course Kamal will turn around and become a good Samaritan himself and say that he is willing to serve as a proxy also. So his key value pair entry gets into another intermediate node. Now we've got three nodes that can serve as metadata server for this particular key David's computer, but intermediate nodes that also have become part of the metadata server network for this particular key 100. And similarly, there is no origin server overload also. Because now the content itself has gotten distributed in several proxies. And all of these proxies have dynamically gotten the content, and have shared their willingness to serve as proxies. So if a new node wants to get the same video 100, when it makes its get operation, that get operation is going to traverse the network, and either hit David himself or hopefully one of the intermediate metadata servers. And that way the request for the actual content may go to different content providers dynamically as the system evolves. So as a result you can see that the metadata server load is distributed. And the origin server is also not stressed. That's the nice thing about the Coral sloppy DHT approach. So the key takeaway in the Coral approach, is that even though an individual request may have a little bit more latency because we're not trying to reach the desired destination directly, but going through some intermediate hops. In particular halving the distance to the desired destination. You're going to increase the latency a little bit, but we are doing that in the common good that in these kinds of environments, giant scale services, big data, large numbers of users, and content suddenly becoming popular, all of this dynamism has be dealt with in a system that is as vibrant as the internet. And Coral is a step towards that by reducing the stress on the origin server, as well as reducing the stress on the metaservers by naturally distributing it.


We covered a lot of ground in this lesson spanning DHTs, CDNs, key based routing and how to avoid overloading the matter data server and the origin server using the concept of sloppy DHT. What Coral System does is offer a vehicle to democratize content generation, storage, and distribution, through a participatory approach. I encourage you to read the paper in full, to understand how the system has been implemented. Of course, Commerical CDNs such as Acmite, do not operate in this way. They're in it for the money. They contractually mirror the content for a customer. And they deploy their own additional mirrors to deal with dynamic increase in volume of requests. With this discussion, we complete the lesson module on Internet Scale Computing.