ud710 ยป

Contents

Introduction

Welcome back. We now launch into the study of distributed systems. This is an exciting field. On a personal note, I started my academic career as a researcher of distributed systems in the mid 80s. Part of the fun, but there's a lot of parallels between distributed systems and parallel systems. What fundamentally distinguishes a distributed system from a parallel system is the individual autonomy for the nodes of a distributed system as compared to a parallel system, and the fact that the interconnection network that connects all the nodes in a distributed system is wide open to the world, as opposed to being confined within Iraq. Or a room or a box. However, as the feature size of transistors in silicon continues to shrink due to advances in process technology and break throughs in VLSI technology, many of the issues that are considered to be in the domain of distributed systems and are surfacing even within a single chip. But I digress. In this lesson we are going to learn the fundamental communication mechanisms in distributed systems and what an operating system has to do to make communication efficient. As in the previous lessons, you will see the symbiotic relationship. Between hardware in the form of networking gear and the operating system software stack, particularly the protocol stack, to make the communication efficient. We'll start this lesson module with a definition and a shared understanding of what we mean by a distributed system. But first, a quiz to get you started.

What is a Distributed System

So, I want to know, what do you understand by a distributed system and I am going to give you three choices, nice perfectly fine if you don't get it right but I just wanted to get you thinking in terms of what a distributed system is and what your understanding is. The first choice says that a distributed system is a collection of Nodes connected by a Local Area Network or a Wide Area Network. Second choice says that, a distributed system is one in which communication happens only via messages. And the third choice is that, a distributed system is one in which events that are happening on the same node, here like A and B. The time between that is called the event time. And the event that is happening across nodes, which is a communication event, Node N1 sends a message to node N2, it's a communciation event from A to C. So, the third choice is saying that communication time TM, is much more Significantly more than the event execution time. That is the third choice so you have three choices and I want you to think about what's your definition, what's your mental model of distributed system is and how any of these choices fit or do not fit your mental model of what you think a distributed system is.

What is a Distributed System

If you chose all 3, your right on. We'll talk about why all these 3 choices, make perfect sense.

Distributed Systems Definition

So a distributed system is a collection of Nodes which are interconnected by a Local Area Network or a Wide Area Network and this Local Area Network may be implemented using a twisted pair, coaxial cable and optical fiber And if it is a Wide Area Network, it could be implemented using a satellite communication microwave links and so on. And the media access protocols that may be available for communication of these nodes on a Local Area Network or a Wide Area Network, maybe ATM or Ethernet and so on and so forth. That's sort of the picture of What a distributed system is, number one. Number two, there's no physical memory that is shared between nodes of the distributed system. So the only way nodes can communicate with one another is by sending messages on the local area network to one another. So there is no shared memory for communication. Between the nodes of the distributed system. No physical memory for communication between the nodes of the distributed system. And the third property is the fact the even computation time, that is the time it takes on a single node to do some meaningful processing, that computation time is what we are calling as the event computation time. That is Te. And a node may also communicate With other nodes in the system and that's what we're calling as communication time or the messaging time, TM. And the third property of the distributed system is that the time for communication between nodes in the system, TM. Is much more significantly larger than the event communication. So these are the three properties which I would like to think of to make sure that we have a shared understanding of what we mean by distributed systems. That they are connected by some sort of local area network or wide area network. A collection of nodes. And their own physically shared memory, so the only communication, only way they can communicate with one another is via messages that are sent between the nodes using the local area network. And the third property, is the fact that the message communication time is significantly larger And even computation time that happens on a single node. You probably remember a good friend, Leslie Lamport. I introduced you to him when we talked about parallel systems, and I said that we will see him again. In parallel systems, he's the one who gave us the notion of sequential consistency, and the same person that we're going to be talking about in this lecture, Leslie Lamport. And in particular LAN port has a definition for a distributed system and the definition of a distributed system verbatim goes like this. A system is distributed if the message transmission time, Tm, is not negligible to the time between events in a single process. Cause there's a time between events in a single process, there's a message transmission time, and so the definition that Lesley Lamport gives is that a system is distributed is a message transmission time tm, is not negligible compared to the time between events in a single process. What is the implication of this definition? Interestingly, even a cluster is a distributed system by this definition. We've been talking about clusters a lot when we discussed parallel systems and I told you that clusters are the work horses of data centers today. Even a cluster is a distributed system by this definition because processors have become blazingly fast, so the event computation has shrunk quite a bit. On the other hand the message communication time is also becoming better but not as fast as the computation time that happens on a single processor and therefore even on a cluster which is all contained in a single rack in a data center, the message transmission time is significantly more than The event time. And so even a cluster is a distributed system by this definition. The importance of this inequality is in the design of algorithms that are going to span the nodes of the network. What we want to make sure, because the message transmission time is so significantly larger than the event computation time on a single node In structuring applications that run on distributed nodes of a system like this, one has to be very careful to make sure that the computation time in the algorithms that you're designing is significantly more than the communication time. Otherwise, we are not going to reap the benefits of parallelism. If, most of the time you're communicating. That's, the reason why, this definition of distributed system is extremely important.

A Fun Example

We're going to look at a fun example. This is me, and I'm going to India for Christmas holidays. And I'm going to make an airline reservation. I'm going to use Expedia to make the airline reservation. So what I'm doing on my computer, I'm sending a message to Expedia, saying, hey, make a reservation for me. And Expedia chooses to make the reservation using Delta, so it sends a message. a to b is a message that I sent to Expedia saying I need a ticket to go to India, preferences, and so on. Expedia then sends a message to Delta booking the reservation that I want. Delta confirms by this message, e to f, that yes, Kishore's reservation is in. And once Expedia has received this confirmation from Delta, it sends me a message, g to h. And this message is telling me that I've go the airline reservation booked. So all of these are messages. a to b is a message, a is the sending of the message, b is the receipt of the message. And c is the sending of the message from Expedia to Delta. e is the confirmation that my reservation is in from Delta to Expedia and finally g to h is the message from Expedia to me saying that yes, you have your reservation, you can go to India in December. That's good. And then, what I'm doing is, I'm directly contacting Delta, message from me, me to Delta, asking for my preference for food. Fortunately, it's an international trip, so I'm going to get a little bit more than peanuts on, on the Delta flight to India. So I sent a message asking for my meal preference and Delta confirms that yes, you have your meal preference. That's the message k to l, is the message that confirms that I have my meal preference, I'm all set. So everything that I've described here is what you probably do on a routine basis, every time you're making any travel plans. Either contacting Expedia or some other web portal to make your airline reservation. All of this makes logical sense, right? There are several beliefs that are ingrained in this picture here about the ordering of events in the distributed system that makes all of this work. In particular, when we look at the set of events that you're seeing here as events that I'm responsible for, we think that these events are happening in sequential order. So for instance, if you look at what Expedia is doing, it is receiving my message saying that I want an airline reservation to be made, does a bunch of bookkeeping. Then sends this message over to Delta saying that well, go ahead and make this booking for him, gets the acknowledgement back from Delta. And then it does a bunch of other bookkeeping, once it gets the acknowledgement from from Delta and then it tells me that, okay, you've got it. And after that, it does some more bookkeeping to say that, well, you know to show if booking is done and I'm going to make some internal notes on the details of this booking. And so those are all things that are happening as events within Expedia. So the beliefs that we have is that processes are sequential, that is, the events that we see happening in a given process, these are the events that are happening in a given process, we expects these events to be totally ordered, right? So for instance, you wouldn't expect given this ordering of events, that you see in Expedia's profile that this event m happened before sending this message c. Right? So that's the mental model that you have, that events are totally ordered within a single process, and that's why we're calling process sequential. That is, the execution of a process is a textual order that you see. At least the apparent effect of the execution of the process that you, as a user, experience is sequential. So if I look at this particular process, h happens before i, f happens before g, and d happens before e, so all of these are things that are ingrained in our mental model of processes being sequential. The other belief is that you cannot have a receipt of a message before the send is complete, right? So you have to send the message before it can be received. In other words, the receipt of a message, which is b here, has to happen after the messages are being sent from here. Similarly, this message reception f must have happened after the message was sent from Delta. So those are the core beliefs that we have about what is happening with events in a distributed system. That events within the process are sequential. And across processes, when you have communication events, send happens before receive. So these are two core beliefs that we have about the working of a distributed system. And we call these beliefs, as they happened before, relationship.

Happened Before Relationship

So lets dig a little deeper into what we mean by the Happened Before Relationship. I'm going to denote the Happened Before Relationship with this arrow. A happened before B. That's what this notation means. What this notation is implying is one of two things, either A and B are events in the same process which means given a belief that a process is sequential A must have happened before B. If it was a textual order A is here and B is here then A must have happened before B and that's one possibility. Or if you're asserting that A happened before B and A and B are not events on the same process but A is an event in one process, B is an event in a different process. Then there must be a communication event that connects A and B. In other words, if A is a communication event of a message, and B is a receipt of that same message. Then, A happened before B, where A is the sender of the message and B is the receiver of the message. So, this is the implication of saying that an event in a distributed system A happened before B, and these events can be anywhere in the system. Anywhere in the system an event could be happening A, and another event could be happening B, and if we are asserting that A happened before B, what we are implying is one of these two possibilities. One is that A and B are events in the same process or A is the act of a sending a message and B is the act of receiving a mess, the same message on a different node of the distributed system. The other property of the happened before relationship is that it is transitive. What I mean by that is, if we're asserting that there is an event A that happened before B. And this event B happened before C. The implication is this relationship is transitive ad therefore A happened before C. So that's the transitivity of the Happened Before Relationship. Now that I introduced to you the Happened Before Relationship it is time for another fun quiz.

Relation

Consider the following set of events on node N1 and node N2. N1 is sending a message and the act of sending the message, the event associated with that is F. And A is the act of receiving the same message on node N2. And B is another event on node N1. Textually follows this send event. And G is another event on node N2. Textually follows the receive event G. Now the question for you, what can you say about the relationship between the events A and B? What can you say about the relationship between these two events A and B? I'm going to give you three choices. The first choice is A happened before B, so I'm going to say that is the first choice. The second choice I'm going to say is B happened before A. And the third choice is neither. And it's okay if you get it wrong, it's just to, sort of, prime the pump for the next concept that I'm going to describe to you.

Relation

The right answer, it's neither. That is you cannot say anything about the order between A and B, given what I'm showing you here. We'll explain more about this in the continuation of this lecture.

Happened Before Relation (cont)

Now that we understand the" happened before" relationship and the transitivity of" happened before" relationship. I also want to introduce this notion of concurrent events. That is, concurrent events are events in which there is no apparent relationship between the events. So, for instance, I'm showing you two nodes of the disterbent system. A is an event on one node, and b is an event on another node, and you can see that since a and b are not events on the same node we cannot apply the sequential process condition to say that there is an ordering between a and b, and by the same token, since a is an event here, b is an event here And there is no communication between these two guys that connections these events in any shape or form, either directly or transitively. There is no ordering between a and b, and therefore these two events are concurrent events, not sequential events, not connected where they happened before relationship, but they're concurrent events. So, in other words, We cannot say anything about the ordering of a and b in the distributed system. So, this is the fun thing about a distributed system is that has "Happened Before" relationship which looks at either events on the same process or events across process is connected by communication and the transitivity of "Happened Before" relationship Through the native happened-before relationships give at best a partial order for all the events that are happening in the system, so there's no way for us to derive a total order by looking at the events that are happening on the same process or just looking at the events that are happening on the different processes. In the distributed system and this is a very good example why it's impossible to get a total order for all the events that are happening in the distributed system. Because there are events they are going to be concurrent, that's the nature of the game that these procesors are executing asynchronously with respect to one another and therefore The event that is happening over here. You know, if I want to look at wall clock time in one execution of the distributed program, it's possible that a in real time, happened before b but the same program when I execute it again, the second time around, it could be that this event b happened before a. And therefore, these point about these concurrent events that happened before relationship is that And structuring a distributed algorithm. It's important to recognize what events are connected by the happened before relationship, and what events are concurrent events, and once you have an understanding of these two concepts, then you can build robust distributed systems and robust applications. Because if you have any assumptions about the ordering of events that are unconnected by communication like this, that can lead to an erronious program. So one of the bane of distributed programs is synchronization and communication bugs and timing bugs. And this is a classic example. Of a timing bug that you can have if you mentally think that A happened before B and that's the way you want it to happen. It may not happen. Because these two events are concurrent events. Now that I've introduce to you all important basics of events and ordering of events in distributed system. It's time for another quiz.

Identifying Events

Now this quiz is an open-ended quiz. And in this, I am giving you the same example, the my fun example of me purchasing a trip to got to India for the Christmas holidays. And, so I am showing you all the communication events. The communication events are shown by these lines here, A to B, C to D, and so on. And this is a textual ordering of events. In, that same process. So these are eventually the same process. And the question that I'm going to ask you, like I said, it's an open-ended question. I want you to identify all the events, that are connected by, the happened-before relationship. I told you that this is the notation that I'm using for happened-before relationship. So I want you identify all the events. And I'm showing you a whole bunch of events starting from A working your way through to L. So all these events. And what I want you to do is identify all the events that are connected by this happened before the relationship. You can choose to connect both events that are directly connected that happened before, as well as transitively connected what happened before relationship. And in the second box, I want you to identify all the events. That are concurrent events in the system. So there's a whole bunch of events here, and I want you to identify the events that you think are concurrent events. Meaning there is no way for, for us to assert that there is any ordering of the events in the system.

Identifying Events

So what I'm showing you here is all the events that are within a single process. Like, all these events are events within my process. These events are within expedia's process. These events are within Delta process. And I'm sure that most of you have identified all these as events that are connected by, the happened-before relationship and here I'm showing you all the events that are connected by the communication events. A is the message sending from me to expedia. B is the receiver of the message. Similarly, C to D and E to F, G to H, I to J, K to L, those are all the communication events. All of them are connected by this happened-before relationship. And if you look at this event m, that's a concurrent event with respect to the events that are happening on my process as well as the Delta process. So those are the things that I've marked as concurrent events. If you didn't get all of these right, that's okay.

Example of Event Ordering

Returning to our original example of me ordering a ticket to go to India via Expedia and Delta. Let's now identify all the events that are connected directly by the happened-before relationship. So, if these are the events in my process, then we know that E happened before H. And, we know that H happened before I. And, we know that I happened for L. So that's the textual ordering, and we know the process is sequential. This is the ordering of events, in my process. And similarly, we can see that, if these are the events in Expedia's process. Then all of these events have to be sequentially ordered. SO, B should have happened before C C should have happened before F, F should have happened before G, and G should have happened before M. So these are the orderings of the events in Expedia's process and similarly we can derive the order of events in the delta process, as sequential. And these are all the communication events that are directly relating events happening between any two processes that I'm showing you in this picture. So, for instance, E to F is a message from Delta back to Expedia confirming my reservation. So E is the act of sending the message from Delta, and F is the act of receiving the same message from Delta on Expedia. So, those are all the communication events. And now you can also look at transitive events, so for instance. What is the relationship between, let's say, event E and event A? Well, it turns out that A must have happened before event E. And the reason is, if you look at A, it happened before B, B happened before C, C happened before D. All of these are communication events, pretty straight forward. So from here to here, it's not a communication event, but since the process is sequential, B should have happened before C, and of course C happened before D. So, since it's a communication event, sequential process D should have happened before E. And that's what gives a transitive relationship between A and E, that A must have happened before E. Similarly, we can identify other events that are transitively connected to one another. Because of the happen-before relationship. So, for instance, D and M apparently don't have any direct connection. But, through the transitivity of events that are happening sequentially, and through the communication, and sequentiality of a process, we know that D must have happened before M. So those are transitive events. And finally, let's look at concurrent events. If you think about this event M that's happening in Expedia. Basically Expedia has confirmed to me that I have the booking that I want. Then it is doing some internal bookkeeping, to record some information about me, maybe my preferences in terms of airlines and so on and so forth. And from that point of view, it is making some internal bookkeeping and that's is event M. And now if you look at this event M, it has no relationship to any of the events that are happening here. I'm showing G, this event H must have happened after G, but what about H and M? There is no relationship between these two guys. This could've happened much later than, than this in wall clock time. Or it could've happened much sooner than event M. So you can see that, H is concurrent with M, and in fact, all the events that you see here, are going to be concurrent with M. And similarly, all the events that you are seeing over here, they're concurrent with So in fact, they're concurrent with M. After this, if you look at the Delta process, after Delta has sent this message to Expedia confirming my booking, it may have done a whole bunch of events, over here. All of those events are concurrent with M, because there is no ordering between these events, and the events over here. But there is an ordering between the event, G and the event J here. Because G happened before H, H happened before I, I happened before J. So, you can see that transitivity connects events across machines. But they could be events that are happening in the distributed system, that are unconnected to other events. And those are concurrent events. That completes discussion of the basics of distributed system. Next we're going to start talking about Lampard's clock.