Both Spin and Exokernel started out with the assumption that microkernel-based operating systems structure is inherently poised for poor performance. Why did they start with such an assumption? Well they used a popular microkernel of the time called Mach which was developed at CMU as an exemplar for microkernel-based operating system structure. But Mach had portability as an important goal. If we keep performance as the primary goal, can we achieve that with a microkernal based approach? In this lesson, we will look at L3, a microkernal based operating system design that provides a contrarian viewpoint to the spin and exokernal assumption.
Just to refresh your memory about micro kernel based operating system structure, the idea is micro kernel the micro kernel is providing a small number of simple abstractions. Such as address based and inter process communication and all the system services that you normally expect from an operating system. Such as the file system Memory manager, CPU scheduling, and so on are implemented as processes above microkernal. In other words, these operating system services run at the same privilege level as user-level applications. All of them in their own individual address spaces. And only the microkernel runs at a different level of privilege provided by the processor architecture. Since all the operating system services are implemented as server's processes on top of the microkernel, they may have to cooperate with one another in order to satisfy a particular user's request. And in that case, they may have to talk to one another, and in order to do that, they need IPC that is provided by the microkernel in order to accomplish what is needed by a particular system call emanating from an application.
What are the potentials for performance loss when you have a microkernel based operating system design? Well, the main source of potential performance loss would occur at border crossings as we have seen before, and border crossings have both an explicit cost associated with it as well as an implicit cost associated with it. The explicit cost is the fact that from an application which is at a particular protection level, namely the user level protection level of the processor architecture, you are slipping into the microkernel, and, which is at a different privilege level. That is the explicit cost in border crossing. And in order to accomplish a particular service that an application has requested, the service has actually provided by several processes above the Micro Kernal, and therefore, there are boarder crossings involved going from the application to the Micro Kernal to the particular service you're talking about Let's say a file system service. And on top of that, a system service like file system may have to consult other services such as a storage module or a memory management module, in order to complete the requested service of the application. In which case there are going to be protected procedure calls that are going to be executed. Between services that are part of the operating system. And these protected the procedure calls, because they are going across address spaces, is going to be more expensive than simple or normal procedure calls. And typically, protected procedure calls can be as expensive as 100 x, 100 times. Normal procedure calls and this is coming up because of the fact that each of these services in a micro-kernel based design is assumed to be implemented in its own address base to protect the integrity of each of these system services. Now why are protector procedure calls that much more expensive than normal procedure calls. This is where the implicit Cost of border crossings comes in. Whether the border crossing happens when we go from the user address piece into the kernel address piece or between one hardware address piece representing a paticular system service. To the hardware address space of another system service. And that implicit cost is a fact that we're losing locality, both in terms of address translations contained in the TLB, as well as the contents of the cache that the processor uses in order to access memory. All of those add up. In making protective procedure calls or going between user space and kernel space, that much more expensive.
The keyword when I describe the, the performance loss in micro kernel-based operating system structure is the potential for performance loss. What L3 micro kernel does is by proof of construction, they show that they can debunk the myths about micro kernel-based operating system structure. Now, L3 micro-kernel, being a micro-kernel, has a minimal set of abstractions. Has address space, threads, inter-process communication, and a service for providing unique IDs for subsystems that live on top of the micro-kernel. The exact details of these mechanisms provided by L3 micro-kernel is not that critical, but the important thing is the micro-kernel provides the minimal set of abstractions as we've been always talking about. Namely address space, threads, inter-process communication, and UID. And L3 argues that these are fundamental abstractions that any subsystem that lives on top of the micro-kernel, or any subsystem that you want to implement in the general purpose operating system requires these facilities. And therefore, L3 argues that micro-kernels should provide this as the minimal set of abstractions. How they actually provide it may differ from one micro-kernel to the other. But the important takeaway is that, this is the minimal set of abstractions that a micro-kernel should provide. Now let's talk about the system services. Now, as I mentioned in the previous slide, the system services have to be distinct protection domains. Protected from one another and protected from the applications that live on top of the operating system. And of course, we have this hard line between the applications and the servers providing the services and the micro-kernel. This is the structure of a micro-kernel based operating system. So what is different about every micro-kernel. The key distinction is that each of these services of the operating system. They have to be in their own protection domain, not necessarily distinct hardware address spaces. The point L3 establishes by proof of construction, is that there are ways to construct a micro-kernel based operating system providing these services. Efficiently, knowing the features of the hardware platform. In other words, L3's argument is, it's all about efficient implementation of the micro-kernel and not the principle of a micro-kernel based operating system structure. So to fully understand how L3 micro-kernel goes about systematically debunking the myths about micro-kernel based operating system structure, we have to understand, the strikes against the micro-kernel.
What are the strikes against a microkernel based design? The first strike is the border crossing cost going between kernel and the user and vice versa. And this you have to do every time a user-level process makes a system call. You have to go through the kernel. That's the border crossing cost. So that is the first explicit cost that can be a strike against microkernel if this were to happen too often. The second strike against a microkernel based design is address space switches. With the assumption that each system service is living in its own hardware address space, whenever an application needs any system service, that may involve the servers living above the microkernel having to talk to one another in order to provide that particular service that was requested by the application. And I mentioned that protected procedure call is the basis for cross protection domain calls. So here's the protection domain, a file system. Here is another protection domain, the storage module. And, if the file system has to get some service out of the storage module in order to serve the original request from the application process, that communication is implemented as a protected procedure call. And going across hardware address spaces minimally involves flushing the TLB of the processor in order to make room for the TLB entries of the domain that we're entering. And we'll talk about that in a little bit more detail shortly. But the key point is, there's an explicit cost involved in going from one address space to another address space, and that is the second strike against a microkernel based design. The third strike against microkernel-based design is the cost for doing thread switches. These thread switches have to be mediated by the kernel. If the file system needs to make a protected procedure call to the storage module, in order to complete an application level request for an operating system service, that involves the file system having to be mediated through the microkernel in order to go and execute some functionality in the storage module. So that involves a thread switch, and IPC, in order to do that. So in other words, the basis for protected procedure call is thread switches and interprocess communication, which has to be mediated through the kernel. And this kernel mediation can be expensive, that is, thread switches and interprocess communication can be expensive. That's the third strike against a microkernel. So all of these three strikes that I mentioned are explicit costs associated with providing an application level service in a microkernel based operating system, because the application has to first make a request to the micro kernel. Microkernel may have to pass that request on to server processes that are living above the microkernel, and the server processes may have to talk to one another. And this protected procedure call itself has to be mediated by the microkernel via thread switching and interprocess communication. And in addition to all of these explicit costs, there could be a fourth cost, which is the implicit cost. And this is due to the memory subsystem and the loss of locality that can happen when we are going across address spaces. And when the file system makes a protected procedure call into the storage module, at that point, we are changing locality from the address space of the file system to the address space of the storage module. And therefore, the thread that starts executing inside the storage module may not find the cache warm, meaning, the contents of the cache may not be reflecting the working set of the storage module that needs to get executed right now in order to satisfy the request coming from the file server. And this loss of locality can be both for address translation in the TLB, for this new address space that we are in, as well as for data and instructions that are contained in the cache. Both of those may take a hit because of the fact that we're going from one address space to another address space. That's the implicit cost of the locality loss that would be the fourth strike against a microkernel-based design. All of these strikes against a microkernel-based design, seems pretty solid. Now how does L3 go about debunking, the myths about microkernel-based design. Let's see that, systematically.
As I said, L3, by proof of construction, systematically debunks all of the myths I detailed in the previous slide. The first myth is the border crossing myth. That is, the user space to kernel space switching. Now L TLB misses that may be incurred because of the fact that we're going from user space to kernel space. As well as cache misses that may be incurred because we are starting to execute some code in the micro kernel. All of that put together, L3 shows, by proof of construction, that you can do that border crossing in 123 processor cycles. And L3 goes one step further, it actually cones through a back of the envelope calculation, what will be the minimum cost involved in doing this border crossing? Or, in other words, how many cycles will the instructions that are needed, in order to do the border crossing, take on a particular processor architecture? And they count the number of processor cycles, and they show that the number of processor cycles needed is about 107 and L3 microkernel accomplishes that in 123 processor cycles. Or, in other words, this is debunking the myth that border crossing is going to be expensive in microkernel based design. It's as close to what it would take minimally in the processor, even if you hand code through machine instructions, what needs to happen to go from the user space to the kernel space? So that is proof that microkernel-based design is not going to be incurring any more overhead for border crossing. It turns out that CMU's Mach operating system, on the same hardware, takes 900 cycles, as opposed to the 123 cycles taken by L3 for border crossing. Spin and exokernel that we have looked at in fairly great detail, used Mach as the basis for decrying microkernel based design, saying that, well, border crossing in microkernel based design is expensive because it takes this much time. But what L3 has shown is that it need not take this much time, it can be done in much shorter amount of time. Which is close to what the hardware can actually do.
This is a good time to bring up a quiz for you. This question is asking you, why did Mach take roughly 800 or more cycles than L3 microkernel for doing this border crossing between user and kernel? Is it because L3 uses faster processor? Is it because Liedtke is smarter? Liedtke, as you know, is the author of the L3 microkernel. Is it because Mach's design priorities are different from L3 microkernels? Or is it because microkernels are slow by definition? Those are the four choices. You had to pick the right choice.
The right choice is the design priorities of Mach. First of all, I already mentioned that the number that I showed you for Mach's performance for user to kernel switch, is on the same processing hardware as L3 uses, and therefore there is no cheating here, and the Liedtke may be smaller but that's not the reason either. And it is not because microkernels are slow by definition, but it is because of the fact that the design priorities of Mach was different. And in particular, Mach's design priority was not just extensibility of an operating system, but also portability. And we will talk more about how that portability design consideration comes into play. In the performance of an operating system later on. But at this point I just want you to understand that the reason why Mach took significantly more time for this border crossing compared to L3 micro kernel. Is nothing to do with the structure of the micro kernel. It is only the design priorities.
The second myth concerns going across protection domains and espeiclaly if the protection domains are impmlemented as distinct hardware address spaces, then the myth is that crossing protection domains implemented using hardware address space switch is expensive. And we are now talking explicit cost for crossing protection domains implemented as distinct hardware address spaces. Not the implicit cost, we're only talking about the explicit cost of crossing protection domain. Now, let's understand what is involved in an address space switch. This is a refresher to show you what exactly is happening in terms of translating a virtual address to a physical address. You all know about the concept of a TLB and the virtual address consists of two parts; an index part and a tag. The index is used to look up the TLB and the tag that is contained in that particular entry of the TLB is matched against the tag coming from the virtual address. If they match, then we got a hit, then this particular virtual address is contained in the TLB entry and you've got the physical frame number that corresponds to the virtual page number that you started with, so that is your, very simple, TLB, matching hardware that does the address translation from VPN to PFN. Now what happens on an address space switch? That is a context switch going from one address space to another address space. On a context switch, the virtual address to physical address mapping will change for the new process that is going to be scheduled. So in other words, if this TLB contains the translations for a particular process that is currently executing on the CPU, if we do a context switch, do we have to flush the TLB because the virtual address to physical address mapping is going to be different for the new process or new thread that is going to run on the processor now? The answer to the question do we have to throw away the mappings that exist in the TLB, when we do a context switch, is, it depends. It depends on whether the TLB has a way of recognizing that the virtual to physical address translation that is contained in it, is flagged by the distinct process for which those translations have been put into the TLB. In other words, if the TLB has address space tags in addition to the tag for disambiguating one virtual address from another virtual address then you don't have to flush the TLB. So these are what are called address space tag TLBs which contain the process ID for which a particular TLB entry is present in the TLB. MIPS Architecture for instance uses address space tagged TLBs. But on the other hand, an architecture may not use an address space tagged TLB. Intel 486, and Intel Pentium are examples of architectures that do not have address space tags associated in the TLB. So, in an Intel architecture, at the point of the context switch, you have to throw away all the entries that are there in the TLB on behalf of the process. In the Intel architecture, actually the TLB is split into two parts, a user part and a kernel part. And the kernel part is common regardless of which process is running. And therefore, you don't have to flush that portion of the TLB, the kernel portion of the TLB, but the user portion of the TLB has to be flushed on a context switch because the virtual address, the physical address mapping, is going to be different for the new process that starts to run on the processor.
Let's look at how address translations work. An, an architecture that supports address space tags in the TLB. Recall that the virtual addresses are being generated on behalf of a particular process, and the process has a process ID that is uniquely assigned by the operating system for that process. So when we make an entry into the TLB, what we are storing in the TLB is not only the tag that is associated with a particular virtual address, but also the PID of the process for which we are creating this entry in the TLB. So in other words, every entry in the TLB is flagged with the process ID that, that particular entry corresponds to. So how does address translation work in this address space tagged TLB? Well, similar to a normal TLB, we're going to take this virtual address and split it in two parts, the index part and the tag part. The index part is what lets us look up a particular entry in the TLB. And from the TLB we get two tags. One tag is the address space tag. This is signifying which process created this particluar entry. So what we want to do is, we want to compare the PID of the process that is currently generating this virtual address against the tag that is contained at that entry. And this matching hardware, is going to say yes or no, if it says no then we are done, then this entry does not correspond to the virtual address that we're trying to translate here. On the other hand, if it does match, this entry does correspond to this process, then we want to ask the question, whether the tag that is associated with this entry, is the same as the tag that is contained in the virtual address. So that's a second level of matching that goes on. And only if both the process ID matches and the tag matches do we have a hit in the TLB. So in other words, there is two level of matching going on. And therefore, when we do a context switch from one process to another process, there is no need to flush the TLB on the context switch, because the TLB may contain some translations on behalf of the previously executing process. And it may contain some translations that correspond to the new process that has been scheduled on it. And the hardware disambiguates these entries by doing this second level of matching of process ID against the address space stack that is contained in the TLB. But, if the memory management hardware does not support address space tag, then what do we do? And this is a case for instance, in the Intel architecture, that the TLB does not have address space tags.
Liedtke, the author of the L3 microkernal, suggests tricks for exploiting the hardware and avoiding TLB flushes, even if the TLB is not address space tagged. And in particular, Liedtke's suggestion is that take advantage of whatever the architecture offers you. For example, the architecture may offer segment registers in x86 and PowerPC, both of them offer segment registers. What the segment registers do is give an opportunity for the operating system to specify the range of virtual addresses that can be legally accessed by the currently running process. What that means is, even though you have a linear address space, that is, a linear virtual address space. You can carve out that linear virtual address space among several different protection domains by using segment registers. So, here is the linear address space provided by the hardware, starts from zero to a max and that max is, of course, decided by the number of bits you have for addressing in the hardware architecture. If it's a 32 bit architecture, you have 2 to the 32 as the maximum addressing capability of that particular processor. If you have 64 bits, you have 2 to the 64 as a maximum address space that's available in that particular architecture. So that's the linear address space that is provided by the hardware. If the architecture, such as the PowerPC, offers segment registers to bound the range of virtual addresses that can be legally generated by a running process. Then use segment registers to define a protection domain. So here is one protection domain, S1, and it uses segment registers to say that this particular protection domain can generate virtual addresses starting from here to here. Any other virtual address generated by this guy is illegal, and the hardware will check that, because the segment registers are hardware-provided facility for bonding the range of legal virtual addresses that can be generated by this protection domain. Similarly, another protection domain, S2, can use the segment registers to carve out a different portion of the linear hardware address space. So, for S2, the bounds for legal, virtual addresses that can be generated by, a thread that is running, in this protect domain, starts from here. Ends here. And so on. So, in other words, we can take the hardware address space that's available. And using segment registers, provided by the hardware, you can carve out the hardware address space into these regions. And once we do that, there is no need for flushing the TLB on a context switch. Because, even before we consult the TLB to see if there is a match for a particular virtual address in the TLB, segment register will act as a first line of defense. And say that, oh, this address is kosher. This address is not kosher because it is not within the bounds of legal addresses that can be generated by this protection domain. In other words, the segment bounds that we're talking about are hardware enforced, so this works really well. If these protection domains are fairly small, meaning that the amount of space, memory space, that is needed by any given protection domain, is not the entire hardware address space. But it is a portion of the hardware address space. So we're able to carve out the available hardware address space among multiple co-resident, protection domains, in the hardware address space, using these concept of segment registers.
You may ask, what if the protection domain is so large that it needs all of the hardware address space? Maybe the file system code base is so big that it needs the entire hardware address space. Cannot share it with anybody else. Similarly, the code base for the storage module is so big that it may occupy the entire hardware address space. Now, if this were the situation, that is, if the protection domains are so large that they pretty much occupy the entire hardware address space, and if the TLB does not support address space tagging, then you have to do a TLB flush on a context switch. If you go from this service to this service, you need to do a TLB flush because the memory footprint of each of these service is as big as the hardware address space that's available on the processor. Or in other words, the segments overlap and therefore you have to do a TLB flush when you do a context switch. The cost that we've been talking about so far is the explicit cost, that is, the cost that is incurred for flushing the TLB at the point of a context switch. And for small protection domains that we want to go across, we want to avoid that cost and we can do that by packing them in the linear address space provided by the hardware. But if the memory footprint of the service is so big that it occupies the hardware address space of the architecture completely, the explicit cost we're going to incur, because we have to do a TLB flush, if the TLB is not address space tagged. But that explicit cost is very insignificant compared to the implicit cost that is going to be incurred. Now what do we mean by implicit cost? We mean cache effects, that is, the loss of locality going from this service to this service, that the cache is not going to have the working set of this service, when we go from here to here. That impact is much more that the explicit cost. For example, LeKay shows that on the specific architecture in which they implemented L3, which is a Pentium architecture. It had 32 entries for kernel translations and 64 entries for user translations. Even if you want to flush the entire TLB, all the entries in the TLB, it takes 864 cycles to do that. But, the loss of locality, when a service goes from this to this, in terms of cache effects is going to be much more significant because, with a large address space you expect that you're doing more work in the subsystem. And the implicit costs are going to dominate.
So the upshot of address space switching is you have to really ask the question, are we switching between small protection domains or large protection domains? If you are switching between small protection domains, then by taking advantage of whatever the hardware gives you, you can make. The switching between these small protection domains efficient by careful construction of the services, on the other hand if the switch is from one large protection domain to another large protection domain. The switching cost, the explicit cost of switching from one hardware address space, to a different hardware address space Is not that significant, because the loss of locality when you go from a large protection domain to another large protection domain, both in terms of TLB misses that are going to happen, when you start executing the new large protection domain. For translations as well as the cache effects, that the fact that your cache is now going to be warm with the data and the instructions for the new protection domain. That cost is much more significant than just the switching cost going from one large protection domain to another large protection domain. So this is the way, the address-based switching myth, is debunked by the L3 microkernel, by construction.
The third myth about microkernel based design is that the thread switches and inter process communication can be very expensive. And by thread switch we mean, if you are executing one thread in a particular protection domain and going to execute another thread in a different protection domain, how much time does it take for this thread switch to be effected by the microkernel, that's the question that we're asking here. Here again, we're only talking about the explicit cost of doing the thread switching. And the explicit cost that is involved in thread switching, is saving all the volatile state of the processor, the T1 thread that is running on the processor. It's modified the registers of the CPU. All of that has to be stashed away in the thread context block before we can schedule the second thread to run on the processor. The cost that we're talking about is saving all the volatile state of the processor. And what L3 does is once again, by construction, it shows that the thread switch time in L3 microkernel is as competitive as SPIN and exokernel. So, once again by construction, L3 debunks that myth that microkernel based OL structure is going to be more expensive for thread switching compared to SPIN or exokernel or even a monolithic operating system.
The next myth is regarding memory effects. And that myth concerns the assertion that the lost of locality in a micro-kernel base design, is much more than in a monolithic structure. Or the structure that is advocated in spin and exo-kernel. But before we talk about the memory effects. There's a primer on the memory hierarchy. You know that, in the process of architecture, you have the CPU, you have the TLB. And you have several levels of caches, main memory. And the virtual memory that resides on the disk. And these caches are typically physically tagged. Now, what do we mean by memory effects? What we mean by that is, if you have this hardware address space, and this of course is much bigger than the amount of space that's available in these caches. And in fact, we know that the entire hardware address space may not even be in physical memory. Because In a demand-based system, when a process is executing, the pages that it needs will be demand paged from the virtual memory that's on the disk, and brought into physical memory. And when the processor is executing instructions, then the instructions and data contained in physical memory move into the memory hierarchy close to. The CPU, so that the CPU can access the working set of the currently running thread by getting it from the cache that is closest to it. That's the hope in this memory hierarchy. What we mean by memory effects is when we context switch between protection domains, how warm are the caches? Now recall that if we have small protection domains. P1, P2, P3, P4. Let's say they are all small protection domains. In that case, led case suggestion is, don't put each of these in its own hardware address space. Pack them together in the same hardware address space and then force protection, for these processes from one another, through segment registers. So when we have small protection domains, then, the caches are going to be pretty warm. Even when you do a context switch from one process to another process. There's a good chance, that the cache hierarchy is going to contain the working set of the newly scheduled small protection domain. So in other words, the memory effects can be mitigated by carefully structuring the protection domains, in the hardware address space. So debunking the second myth with respect to address space switching, also helps in reducing the ill effects of implicit costs, associated with address space switching, because these small protection domains. Occupy only a small memory footprint, and therefore occupy only a small memory footprint in the caches as well. And therefore, when we go across small protection domains, there's a good chance that, the locality for the newly scheduled small protection domain is going to be contained in the cache hierarchy. So the caches are probably going to be warm, when we do context switches, so long as the protection domains are small. We already mentioned that if the protection domains are large, you cannot avoid that. Whether it is a monolithic kernel or. The exokernel, or the spin type of extensibility. If the memory footprint of the system service is so big, it is going to pollute the cache when we visit that particular system service. So even if we have a monolithic kernel, and the monolithic kernel has subsystems, that occupy a significant portion of the hardware address space. Even though we are not doing any context for it, the ill effects of implicit costs in the memory hierarchy is going to be felt. Because the cache is after all a physically tagged. And therefore, when we go from large protection domains, or large subsystems in the context of a monolithic kernel. You have to incur the cache pollution. So that ill effect is unavoidable for large protection domains. So the only place where a monolithic kernel can win, or an exokernel can win, or a spin can win, is in small protection domains. And a microkernel can also win for a small protection domain, by packing multiple small protection domains in the same hardware address space. So this begs the question. Why was it so bad in Mac? Recall, I mentioned that, in Mac, on the same hardware, border crossing cost 800 more cycles, compared to the border crossing in the L3 microkernel. So we can ask the question, why was it so bad in the Mac microkernel?
The reason for Mach's expensive border crossing, is it's focused on portability. What that means is that, the Mach microkernel is architecture independent to allow that Mach microkernel to run on several different processor architectures. What that means is that there's going to be a code bloat in the Mach microkernel because it has to have the personality for any architecture that it may need to run on. In particular, in the Mach microkernel, there is an architecture independent part of the microkernel, an architecture specific part of the microkernel. The two together results in a significant code bloat, which means it has a large memory footprint. Which means, it has lesser locality. Which means, you're going to have more cache misses and pollution of the cache. And that is the reason why you have a longer latency for border crossing in Mach, as opposed to the theoretically smallest number of processor cycles in order to go from user space to kernel space. As I mentioned earlier, LeKay did back-of-the-envelop calculation on the same processor hardware and said that it should take about 107 cycles to go from user space to kernel space. And then he implemented L3 microkernel and showed that it took only 123 processor cycles to do that. And the reason for Mach's expensive border crossing is all due to the fact that there is code bloat, resulting in more cache misses and therefore incurring longer latency for border crossing. So in other words, Mach kernel's memory footprint is the culprit for the expensive border crossing. It is not the philosophy, or the principle, of microkernel based operating system design, because by proof of construction, L3 has shown that you can have very minimal border crossing overhead by careful construction. Another way of saying it is that portability, which was one of the main concerns in Mach design, and performance, are at loggerheads with each other. So if you focus on portability, you may sacrifice performance, and that's what we saw when we look at the difference between Mach's performance and L3's performance on the same processing hardware.
So, L3 Microkernel shares to debunk the myth about Microkernel-based operating system structure. It goes beyond that, and it has a thesis for how operating systems should be structured. The first principle advocated by L3 is that the Microkernel should have minimal abstractions that includes support for address spaces, threads, interprocess communication, and generating unique IDs. Why do we need these abstractions in the microkernel? The argument is that these four abstractions that I mentioned, address space, threads, IPC, and UID, Our extractions that are needed by any subsystem that provides a functionality for end users in an operating system and therefore, the principle of optimizing the common case suggests that these abstractions should be part of any microkernel. The second thesis coming out of L3 microkernel experience is that microkernels are process specific in implementation. In other words, if you want an efficient implementation of microkernel, you have to take advantage of whatever the hardware is offering you. Which suggests that micro kernels by the very nature are non-portable if you want to focus on performance. Making a micro kernel processor independent is essentially sacrificing performance. So if you put these two principles together, what L3 is advocating is the right set of kernel abstractions and processor-specific implementation. If you do that, then you can build efficient processor-independent abstractions at the upper layers. All of the services that we associate in a monolithic kernel like a UNIX operating system such as, file system, network protocols, scheduling, memory management. They all can be implemented in a processor independent way on top of a microkernel that provides the right set of abstractions and exploits whatever the hardware gives in terms of capabilities to get an efficient processor specific implementation. That's the thesis of L3 microkernel, processor specific kernel and process the independent abstractions at higher layers of the operating system stack.
Research on the structure of operating systems in the mid' 90s, as exemplified by the three research papers we studied in this course module, led to fascinating innovations in the operating system structure. It is important to note that all three systems, Spen, exokernel, and the LT microkernel were done contemporaneously. In this sense, they mutually informed each other and laid the basis for several innovations in operating system structuring. For example, many modern operating systems have internally adopted a microkernel based design. Similarly, technologies such as dynamic loading of device drivers into the kernel, have come out of the thoughts that went into extensibility of operating system services. The next lesson we're going to look at is a logical progression of the idea of extensibility, namely virtualization.