ud710 ยป

Contents

## Introduction

Now that you have learned the basics of distributive systems, in particular the happen before relationshipt. We are now ready to talk about Lamport's clock.

## Events

Let's say in the distributed system, there are two events. I don't know where they are happening. There's an event called a and there's an event called b. Somewhere in the distributed system, these two events are happening. And it so happens, when I look at the record of all the time stamp associated with the events, I see that the time stamp associated with a is less than the time stamp associated with b. So I want to make sure that you understand the premise of the problem here. What I am saying is that the time stamp that is associated with the event a happens to be less than time stamp associated with the event b. That's what I am observing, by looking at sort of a log record of all the events that took place in the system and now my question to you is. If C of a is less than C of b, does that mean that a happened before b or does it mean b happened before a or does it mean a happened before b with the condition that it's either the case that a and b are events in the same process or a is the act of sending a message and b is the act of receiving the corresponding message. So you have to pick the right choice among these three choices.

## Events

This conditional statement is the right choice and the reason is because of the fact that the time stamps themselves don't give the whole story because all that we have are partial events happening on every node in the system. And we'll elaborate that a little bit more when we describe Lamport's logical clock in its entirety.

## Logical Clock Conditions

So now we're ready to describe the conditions for the logical clock proposed by Lamport. First of all, the first condition is that if I have two events, A and B, in the same process, the first condition says that the clock value associated, or the time stamp that it associated with event A has to be less than the time stamp associated with event B. In other words, we have this counter or a logical clock on every node of the distributed system that is monotonically increasing as events happen on that process. The second condition is that when we have a receipt of a message, we want to make sure that the receipt of the message has a time stamp that is greater for sure than the sending time span. So in other words if A and D are the act of sending a message from process I and D is the act of receiving the same message on process J, then what we are saying is the time stamp associated with the event A has to be less than the time stamp that is associated with the event D. In other words, we want to choose the time stamp associated with D as the max of the time stamp that I see in the incoming message, incremented by some value. Whatever the local counter is saying. These are the two things that I'm going to look at and decide the max of that as the time stamp to associate with event D. And if the events are concurrent, in this case, if I look at this picture here, A is the act of sending the message, D is the act of receiving the same message. B is an independent event that's happening on process PI, it has nothing to do with this event D that is happening on process PJ. And these are concurrent events. So the concurrent events, we've already seen this when we talked about the happened before relationship. In the case of concurrent events, the timestamps are arbitrary. Just by looking at the time stamp, I cannot say that B happened before D because, if I see the time stamp associated with B here, happens to be four. Over here, we picked the time stamp for D by saying that it has to be at least greater than the incoming time stamp. So we gave it a time stamp of 3. And so if I look at these two events, B and D, D has a time stamp that is smaller than B, but that does not mean that D happened before B, because these two are concurrent events, and therefore there's no way to know which event happened before the other. So in other words, just because we find that there is an event X, which has a time stamp that is smaller than a time stamp associated with another event, Y, doesn't mean that X happened before Y. So, while this condition is an important condition, the condition that if an event A happened before B, as we show in this picture, we have to ensure that the time stamp associated with event A is less than the time stamp associated with event B. But the converse is not true. Or in other words, if I have two events, and it so happens that the time stamp associated with this event X is less than the time stamp associated with this event Y, that does not mean that X happened before Y. This is very, very important. What that means is that Lamport's logical clock gives us a partial order of events happening in the entire distributed system. So if I take any one process, I know all the events, the ordering of all the events that happened on this process both the events that happened sequentially in this process itself as well as events that happened in which this process happened to communicate with other processes. In this case it sends a message over here. Similarly by looking at the record of all the events accumulated on process Pj, I can know the order in which the events happened in this process, in which process PJ had a part to play meaning all the local events as well as communication events that PJ participated in when it communicated with the other nodes in the entire distributed system. This is what the Lamport's logical clock gives you is a partial order of all the events that happened in the distributed system.

## Need For a Total Order

Is partial order good enough for constructing deterministic distributed algorithms? It turns out, it may be sufficient for many situations. The airline reservation example I started with would work fine with a partial order of events dictated by Lamport's clock. But there are situations where there maybe a need for a total order of events in the distributed system. Let's look at an example. Here is an example to illustrate the need for a total order. I'm going to use my own, personal life example to illustrate the need for total order. I have, one car and my family consists of my wife, my son and my daughter, and we share this single car. And what we want to do is make sure that we can make local decision on who gets dibs on using the car at any point of time. And we're going to use Lamport's clock for this. So what we do is, whenever we want to get the car for our personal use, we're going to text everyone with a time stamp. I'm going to associate a time stamp, if I'm, if I'm requesting the car, I'm going to text every one, and associate a time stamp with that request. And it is a logical time-stamp and similarly my wife would do the same thing, son and daughter all of us do the same thing. And how do we pick the winner, well, locally we can look at the time-stamp of request that have come in from others and my own request. And whoever has the earliest time stamp wins. Pretty simple, right? So pretty simple, everybody is going to make a local decision, looking at the time stamps of request that have come in from others and say well, you know right now, it's my son's turn to use the car, or my daughter's turn to use the car and so on. But what if the time stamp, because these are locally generated by each one of us, happens to be the same. So, for instance, let's say my son makes a request. Takes, sends a text message. My son makes a request, sends a text message with a time stamp ten to all of us. So this is the blue arrow that's going everywhere, so that's indicating to all three of us that he wants the car, and timestamp ten is when he generated the request. So happens, my wife also makes a request for using the car, exactly with the same timestamp ten. And that's the purple arrow that you see. So, now we have a problem. And the problem is, all of us are looking at these text messages and trying to make a decision, who's got the dibs on using the car? How will my son and my wife know, given that both the time stamp is the same, which one is the winner for using this car? Now, what we do is, we're going to break the tie, and I'm going to stipulate that age wins. And therefore, in this case, if the time stamp happens to be exactly the same, then my wife, by seniority, is the winner. She gets the car. So, that's how we break the tie. You can see, through this example, that there is a need for total order in decision making when you have a distributed system. And you want to make local decisions without bothering anyone, based on information that you have, but you have to make that local decision unambiguously, because you cannot have both my son and wife thinking that they have the car at the same time. That'll be a problem. So, whenever there is a tie, we have to break that, and that's the need for the total order.

## Lamports Total Order

So having seen the need for total order and an ambiguous decision making in the distributed system let's now introduce Lamport's Total Order formulation. This is the notation we're going to use for Lamport's Total Order. And so what we are saying here is that if there are two events A and B in the distributed system, A happens to be an event in this case on PI And B happens to be an event, on process PJ. If we want to assert that, a totally is ordered ahead of b, that is a precedes b in the total order. This is true only under the condition that either, the time stamp associated with A is less than B or the time stamp happens to be the same and there is some arbitrary other function that helps us to un ambiguously decide which event precedes the other and for instance I might say that the process ID that I associated with PI and PJ, that maybe the one that I use to break the tie. In my family car example, I told you that the seniority of the family member makes the decision in terms of how we break the tie. So for instance in this, in this case if process Pi has a process ID 100 and process Pj has a process ID 200 then we could say that the arbitrary decision making in the case of a tie is that whichever process has a lower process ID, that's going to be the winner. So we might, we might decide that in this case, if time stamp happens to be the same, then since Pi is less than Pj, I'm going to say that a preceeds b. So it's an arbitrary, well-known condition for breaking a tie. So every process in the entire distributive system is going to use the same well-known, arbitrary condition in order to break to tie. So that also brings up another interesting point, and that is there is no single total order. The single total order comes from the choice of the arbitraty well known condition. I gave you the example of my family car, we broke the tie in the family car situation by saying that the seniority is the winner. Tomorrow as a family we could make the decision that the youngest person is going to win. In that case, my daughter will have dibs over the car over everybody else in the case of a tie. The other important point to understand is that all of this idea of associating logical time stamps with events, and then deriving a total order from the logical time stamp using this kind of a, a method of saying we going to just believe the time stamp associated with the respective events. We going to believe that time standard are associated with the events and use those time stand as a way of ordering them, so that we can develop total order. And if it happens to be a tie, everybody uses a well known arbitrary condition to break the tie and that's how we derive a total order, and once we have derived the total order, the time stamps are meaningless after that. We don't care about them anymore. The whole idea of having these logical time stamps. Creating a partial order. And from the partial order deriving a total order using this formulation. For total ordering. Is so that we can get a particular total order. And once we get the total order, timestamps are meaningless.

## Total Order

So, in this question, I'm showing you three processes, P1, P2 and P3. And these are the events that you can see happening in these three processes. And, what I want you to do is to derive a total order and in deriving the total order, we're going to use Process ID to break the tie. And smaller the Process ID, the higher the priority. In other words, P1 dominates P2 dominates P3. So, using that, go ahead and derive a total order for this set of events that I'm showing you happening in the distributed system.

## Total Order

So first we can write down the partial orders that we see in the distributed system. You can see that 'a' happens before 'b', 'b' happens before 'c', 'c' happens before 'e' and these are purely coming from the chain of communication and local events that are happening over here. When we come over here, we also observe that 'f' happens before 'e' that's because of sequentiality Of this process. And similarly, we can see that in process P1, a happens before d. So these are the partial orders and we already have an ordering for the events that follow this chain because the logical time stamp are assigned in this fashion. And in order to derive a total order we said we basically will believe the event time stamps. To order them totally. So we can order these events totally. A1, B2, C3, E5. We can order them totally. Now let's look at the concurrent events that are happening. The concurrent events that are happening is d over here is concurrent with all the other events in the other processes. D is concurrent with b and c. It is concurrent with f and e. And similarly if you look at this event f, it's concurrent with all the events that are happening in process P2 and P1, right. So this is what you as the concurrent events. So now what we have to do is, given that these concurrent events, we have to somehow Fit them into a total order. As I said before no problem in fitting these guys into a total order because they already have time stamps that distinguish them from one another. The time stamp associated with a is 1, b is 2, c is 3, e is 5. No problem with that. So the real problem comes with f and d. Now the sequentiality of this process is what made the time stamp associated with e to be 5 because the message that came over here had a time stamp of 3 but we associated a time stamp of 5 with e. Because the local event preceding e have a time stamp of 4, so we pick 5 as the time stamp to associate with e, so that the sequentiality of this process p3 is respected. So, we've got this, and now we've got f and d, and so f obviously is going to sneak in Before e, that, that's no problem, that is coming from the sequentiality, but where will we put d, do we should be put d after e? Or should we put d before e? In the total order. This is where the breaking the tie using process id comes into play, Because these two guys are concurrent events in the system, we are breaking the tie using the process ID. P1 happens to be less than P3 in process ID space and therefore we are going to say in the total order, dis going to be ahead, totally ordered, before e. So the final ordering that we end up with, the total order that we end up with, is a0, b, c, F, and then d. And then e, so that's the total order that we come up with. Respecting the logical time stamp associated with the events, and breaking the tie using the process ID.

## Distributed ME Lock Algorithm (cont)

So, we can informally talk about the correctness of the distributed mutual exclusion lock algorithm. The correctness is based partially on some assumptions and partially on the construction. The construction is that the Q's totally ordered by Lamport's logical clocks and the PID to break the ties. But that's part of the construction of the algorithm. But it also is based on some assumptions that we make and the assumption is, messages between any two processes arrive in order. So messages don't crisscross each other but if I send a message and I send another message, the first message is going to reach the destination first, second message is going to reach the destination second. And that's what is meant by saying that messages arrive in order at every node in the distributed system. And the second assumption is that there is no loss of messages. So every message that is sent is definitely received, in order. So these are two fundamental assumptions that are responsible for this algorithm being correct. Now that you have seen Lamport's mutual exclusion lock algorithm, time for another quiz.

## Messages

So the question for you is, how many messages are exchanged among all the nodes for each lock acquisition followed by a lock release? Every process, what it is doing is making a lock request and it follows the algorithm that we just mentioned, uses the lock in the critical section. And once it is done with that, it unlocks by sending messages again as you saw. And the question to you is, how many messages are exchanged among all the nodes for each lock acquisition followed by a lock release? That is, combination of lock plus unlock. How many messages does it generate? Is it N minus one messages? 2 to the n minus 1? Or 3 to the n minus 1, where n is, the number of nodes in the distributed system.

## Messages

The right answer is, 3 times N minus

## Message Complexity

So let's look at the messaging complexity of the mutual exclusion lock algorithm. The lock primitive, when a process makes a lock request, it sends N minus the distributed system, there are N minus 1 peers, and so every node has to send a request message to all its peers, so N minus 1 messages are the request messages sent out. And in response to these request messages, every peer is going to acknowledge a request message. So, they're going to be N minus 1 messages traversing the network, which are the acknowledgement message for this lock request. And then, the process is happy using the lock for the critical section. And it gets to the unlock primitive, and the unlock primitive, once again, we're going to send N minus 1 unlock messages, through the unlock primitive and while sending a unlock message to every one of the peers in the distributive system, so N minus 1 messages are sent over the network. Interesting thing that you notice is that there's no acknowledgment for the unlock message because of the assumption that we make that messages are never lost. And therefore, when I send an unlock message, I know that everybody will get it, everybody will remove my request from the respective queues and go on with life. And therefore there is no acknowledgment for that. And so if you count the number of messages that are involved and a lock plus unlock, totally, we have 3 to the N minus 1. That's the total number of messages that are incurred, that is a message in complexity, of the distributed mutual exclusion lock algorithm. That begs the question, can we do better? And the answer is yes, and the reason is going back to the condition that I said that is used in making a decision as to whether I want the lock or not. If you recall, I told you that the condition is, my request has to be at the top of the queue, and second, I should have either gotten acknowledgements for that request from everybody else, or I've received a lock request from my peers that have a time stamp that is later than my own lock request. If the lock request that I received from my peers have time stamps that are later than mine, I know that they are going to wait for me to be served before they're going to use the lock. And therefore, what I can do if I am a receiving node for a lock request, when I see a lock request, normally I would go ahead and do an acknoweldgement. But, when I get a lock request and I see, hey, this guy's lock request is going to be after mine, so I don't have to send an acknowledgement yet. What I can do is wait til I'm actually going to unlock. My unlock itself can serve as the acknowledgement for that particular node that has made a lock request that is later in time than my own. So in other words, we can defer the acknowledgements if my lock request precedes my peers lock request. So we're combining the acknowledgement for a lock request with the unlock. So if I do that, then I can bring the message complexity down to 2 the N minus 1. So what we're doing is to gather the acknowledgement messages if in fact our own lock request is ahead of an incoming request that I see from up here. That's how we can reduce the messaging complexity of this algorithm to be 2 to the N minus 1. By the way, the distributed mutual exclusion lock problem have been a fertile ground for researchers to think about new algorithms that can shave the messaging complexity even further from this 2 to the N minus 1, and I invite you to look at the literature to see other works that have been done to reduce the message complexity to even smaller numbers than 2 times N minus 1.