cs259 »

Back to Course Wiki

CS259 Lesson2


Assert Power Up

Welcome back again. This is the second unit in the debugging course. This topic is asserting expectations. Today, we will go and explore assertions. That is statements in the program that automatically check for errors during execution. Assertions are by far the most powerful debugging tool, in particular, because they are the key to automating debugging. That is debug while you sleep. If debugging were a video game and you would be hunted by bugs, then assertions would be a power pill that would help you catch all the bugs. Okay, specifically, we are going to learn how to learn how to write assertions, how to check assertions, and how to infer assertion from executions automatically. In our last lecture, we already had met the assert keyword from Python. Generally speaking, the statement assert condition, evaluates the condition, and then its behavior depends on whether the condition evaluates the true or to fault. If the condition holds, we proceed as usual. If the condition does not hold, however, then we interrupt program execution by throwing an exception. When you're writing your condition, you assume that condition will hold. This is normally useful when you're writing a test. For instance, if we want to test the function square root, we can write a test function that asserts the √4 is 2 and √9 is 3, and if any of these tests fail, that is if the result of √4 is not 2, then the assertion will fail and throw an exception. Such a functionality of assertion is available in all programming languages. You could even write your own assert function.

Built in Assertion Functionality

So in case an assert function would not be available in your language, you could easily roll out your own. I'm coming up with my own assert function which I call "my own assert" takes the condition. If the condition is false, then we raise an assertion error. Now, if I invoke say "my own assert" on 2+2=4, very simple test, we can run this and well, nothing happens because the assertion is met. However, if we assume the 2+2=5, we can run this and we get an assertion error that's being raised in here, and this gives us almost the same functionality as the built-in assertions. While you can easily roll out your own assertion, built-in assertions typically do have a couple of advantages. There is identification--the built-in assertion typically tells you which assertion failed. There is location that tell you where the assertion failed--that is where in the code. They are optional--meaning that they can be turned on or off although turning assertions off is mostly a bad idea, which we will discuss later. And finally, they're standardized--meaning that everybody can immediately recognize an assertion as such because they always take the same form. In the languages C and C++ for instance, assertions come with a built-in function named "assert" as in Python which you get by including a special file assert.h, and then, as in Python, you use assert except that you put the argument in parenthesis, and again, you pass the condition as an argument to the assert function. If this assertion failed, you're going to get a message on the console right before the program exits, and the message will look something like this--there you go. And now, you can see that the message actually has everything we wanted in our built-in assertions. So first, the failing assertion is identified down here as coming directly from the code. You see the location here--it's in the file foo.c in line 9. In C and C++, assertions are optional, you can turn them on and off using a simple argument to the compiler, and finally, they're standardized if you include assert.h, that's always the form, and this is what programmers look for and this is what programmers understand.

Failing Assertion in Python

Q. Now, compare the level of diagnostic output we've seen in C and C++ with the amount of diagnostic output you get in Python. If you write assert 2+2=5 as condition that evaluates the form. So which information does one get from a failing Python assertion. Do you get the failing assertion condition. In our case, 2+2=5. The location of the assertion in the program. In our case, line 1. The location of the defect in the code or the list of callers. Try it out for yourself, over to you.

S. Welcome back. What's the answer to this? Well, very simple--we go and try this out. Here's a failing assertion--2+2=5. Let's see, when we run this whole thing what we got? So here's the antitraceback. We get the assertion error. We get the assertion that failed. At least we get the line and the code, which is the same. We get the location--line 1 malfunction and we get the list of callers. So we've seen the list of callers, the location of the assertion of the program, and the failing assertion itself. What we do not get is the location of the defect in the code because the defect would actually causes the infection that later causes the assertion to fail may be in a completely different location. That's the whole point of debugging, trying to find the defect that causes an assertion to fail. However, during this course, you will find techniques that help you a lot in finding out where the defect is that causes an assertion to fail and all of these automatically.

Pre and Post Conditions

So far, we may have encountered assertions in tests, in particular, in unit test. Here, assertions are being used to check the results of a single run, mainly the test run. What we're going to look into is assertions that are integrated into the program code itself. Where the check all runs at once. A classic example for the use of assertions is a square root program. In the beginning, when invoking square root, we want to make sure that x, the parameter, is non-negative. I'll skip the actual computation of square roots here because this is the subject of another class, but before we return to the actual square root, we want to make sure that the square root actually is a square root. How do we do that--well, we square the square root and check whether it's identical to the original argument x. These two kinds of assertions form a pair and both of them have very specific objectives. An assertion that's being called at the beginning of a function and which checks the properties of the argument is called precondition. An assertion which checks the result of a computation before it's actually being returned is called a postcondition. If you have such explicit checks in your code, this greatly increases your chances of quickly finding the bug. Because these checks would re-invoke every single time that the function is executed and if they don't fail, you know that the computation is actually correct, which is a good thing. If you remember, our cause-effect chain has a succession of program states where a defect introduces an infection which is then propagate to become the failure. Imagine what happens if you actually do have assertions in your code that check large parts of your state. Now imagine what happens if you do have assertions in your code that check large parts of your state for whether they're valid or not. Suppose you have an assertion that checks all of these areas every time a function is invoke, then you wouldn't see the failure only at the last memory but you would see the failure as soon as the infection spreads throughout. If you have another assertion that covers this part of the state, then you would see the failure at the very moment it is introduced. which of course also means that you can immediately identify the defect that causes the infection. Earlier, you see an infection--for instance, because an assertion fails. The shorter the cause-effect chain, we need to investigate. Also, if you do have parts of the state that are covered by assertion and these assertions don't fail, then we know that the state in here is actually not infected and so in your search, you can narrow down your search towards those parts of the state, which are not covered by passing assertions and all of these happens automatically with every single run, every single test run, and if you want so, even every single production.

The Blame Game

Q. Assertion also easily allowed you to put the blame on specific places. Assume I have a function f with a pre and a post condition, and we call f and this precondition of f is satisfied. Now, f calls g and now, the precondition over here is violated and raises an exception. Quiz question, who is at fault here? Is it f because it violates the precondition of g? Is it g because it raises an exception? Or is it both f and g because they are incompatible? Over to you.

S. The correct answer is f because it violates the precondition of g. Pre- and post conditions are generally considered specifications-- that is they have the authority to say what is right and what is wrong, and if you violate a precondition, this means that the caller is wrong. G raises an exception well and that's just the messenger. We don't shoot messenger. And f and g, well, obviously are incompatible--yes, that's right, but since pre- and post conditions serve as specifications are definitely f is at fault for violating this specification and g is fine--it just raise the exception.

Testing Square Root

Q. If you have assertions in your code, not only can you detect errors much more easily but you can also test your code much more thoroughly. In particular, you can separate two things. A test consists of two parts. First we need to generate on execution and second we need to check the outcome. With assertions in your code, checking the results is already done so we can focus on generating the tests. Let me illustrate with a very, very simple example. The idea is that we're going to take a square root function with assertions integrated and just use random testing to test it. Here's a very, very simple implementation of the square root function. We simply invert the built-in Python square root function from the math module, and we have set up appropriate assertions to check for the precondition and for the post condition Now for the test driver, I have set up a loop with 1000 iterations and we generate random numbers in the range of 0 to 9,999.99. This is random with random function which returned a value between 0 and 1. This is the floating point number and it comes from the random module and then when we feed this random number into a square root function turning assuming the result in z Note that we don't do any assertions down here because the assertions up here have already take care of everything. When all test are successful, we print out done. So this is our test and the question is, what's going to happen? Is it the program terminates gracefully, output is done, all test are fast. Is it that the square root precondition fail? Is it that the square root post condition fails? Or thinking of Python, do we see something completely different. Over to you.

S. Thank you very much for your answer. Proof of the pudding is in the eating. So what we do is we simply press run and we press run again and we see-- we get an assertion error and indeed it is the post condition of square root that has failed. So the correct answer is the reduction square root post condition fails. So we saw that the post condition of square root mainly squaring the square root equals the origin of value failed. This is weird because all we use--we invoke the standard square root function of Python. Does this mean that Python square root function is wrong and now finally, through systematic random testing, we found the bug. Let me show you what's actually going on in here.

Understanding The Error

Why does the post condition of square root fail? The answer is simple but not so simple. What we lack here is diagnostic information. We don't know the values of r or z or anything that happens in between. So what we do i, we set up a framework that allows us to access these. So what we have set here is we have set up a try except block around the square root function, which catches the assertion and when an assertion is being raised, or any other kind of error is being raised, we print out the offending parameter r and we exit the loop, so let's see what happens here if we invoke the whole thing. So here's the bad value, 8436.9207865. Wow, magic value. So why can't we just square this. What's wrong here. So we print out the value and now we go and actually square the square root and multiply it by itself. Let's see what we got--again, we run the whole thing and 2123.01753488 and this is 2123.01753488. Well, by all means and measures, these two look identical to me. Why does the assertion above it failed. We can solve the mystery simply by looking at the actual difference between these two values. Let's see what the numerical difference is between the square root, square, and original value itself. Here comes another value which fails 9,049. something and here at the very end, you see the difference--it's 10^-12, meaning that 12 digits after the comma we do have a difference between r and the √r² The reason why this assertion failed as simply rounding error, the numbers are the same up to the 11th digit. At the 12th digit, it is actually are different and this is why the assertion failed.

Introducing Precision

P. How are we going to address this? We're going to introduce a parameter--the epsilon, which tells us the maximum deviation from the current result as an absolute number and we're going to give it a default value-- say up to seven digits after the comma is still fine. Now up to take this example and set up a suitable post condition in here, such that the difference between y² and x is never greater than eps.

S. So now here we have the correct assertion. We take the square of the square root compared with x and that absolute difference should be smaller than eps. We go and tick on run and if everything works well, our test should passed by now. See, prints done, which means that all assertions actually now have been satisfied. Just for the record, on my machine at this point, I can go and run a million test like this per second, which means that assertions can be a really great test device, compared that to the time it would take you to write a million number of test cases, which means that once you do have assertions, you can go and run test like crazy.

Assert Tag

Q. Assertions in the code are there to stay. We use them for catching errors, which we can also use doing testing, but they also serve as documentation tool for programmers because they show what the assumptions of individual methods are and they show what the individual method is supposed to produce. And let's apply assertions to another program, an old favorite of ours-- our buggy HTML stripper from the first unit. So here again we have the function to remove HTML marker, which we already know from the first unit. Here as well, a simple assertion would have sufficed to catch the error, mainly an assertion that checks for the non-existence of HTML tags in the output. Such an assertion could be something like this--we simply check whether there is any less than sign that is the beginning of an HTML tag in the output. We see, there's none and if there would be one, then the assertion would immediately fail. So here comes the quiz. With this assertion, which input would cause the assertion to raise an exception-- is it foo in bold, is it foo in quotes and bold, is it foo in bold and quotes, or is it foo in bold and single quotes. Which of these causes the assertion to fail? Over to you.

S. Thank you for your hard work. Let's check these inputs one by one. The first one, well this is just tags and no quotes--always has worked. The second one--well, the second one will have the quote stripped from the input, but this will not be caught by the assertion because the assertion only checks for the presence of HTML tags and these will be removed. The third one, however, will not have the HTML tags removed, so this is the correct answer. And the last one has single quotes and single quotes actually were as they should be. We can verify this within our programming environment. We can add a simple test here. This is the input which should cause the assertion to fail, press and run, and as you can see down here, the assertion actually failed. I leave it to you to try out the other three inputs.

Refining Assertions

Writing assertions is not always easy. You need to be prepared for a number of iterations until the pre and postconditions are accurate. A typical chain of tools looks like this. Here, we do have test inputs either random generator or systematic generator or coming from users or coming from production. These test inputs have been checked by preconditions, and the results of the computation will be checked by postconditions. Now, as it comes to preconditions and postconditions, you can be either too permissive or you can be too strict. If you are too permissive with your preconditions, this means that inputs, which shouldn't go into the function, go into the function and then result in arbitrary behavior. If you are too strict with the preconditions, this means that valid inputs will result in failing preconditions. The same also applies for postconditions. If you are too permissive with your postconditions, this means that you will not catch a number of bugs. If you are too strict, however, then you will catch bugs where there are no bugs. The key is to come up with the right balance between these two. We don't want to be too permissive. We don't want to be too strict. In theory, you like to have preconditions and postconditions that capture exactly what the program is doing. However, in practice, you will find that while writing preconditions is typically easy, writing postconditions or regular functions can be very hard. And therefore, it's very common for postconditions only to check a part of the actual state that are your postconditions that are simple but a bit too permissive that is the missed errors. Whereas for preconditions, it's frequently feasible to come up with the right level of precision. Why is that postconditions are so complex? To give you a simple example of why writing correct postconditions is not easy at all, Let me give you a very simple example. Let assume we do have a sorting function. The precondition for a sorting function is simple. Basically, we simply assume that x is a list. Well, we could put up an appropriate assertion here, but the precondition is to accept essentially anything. For the postcondition, we have to make sure two things. First, we assume that the list y that we will return actually is sorted, but we also need to make sure that the list we returned is a permutation of our input list. So let's go and check with the list sorted. We iterate over the elements of y. And if one element is greater than its successor, we return false. Otherwise, we return true. So what you see here is that in order to check the postcondition, we needed to come up with another function, which actually now should begin to be checked against what? The property that is sort to check here is essentially ordered to describe in here. Meaning that any assertion you could have here would be just the same as anything you have in here so you simply have to believe that this is true. Well, I didn't believe that I actually ran a number of tests, so I'm pretty confident that this works. Now, this is just a sorted. You also have to come up with its permutation. This is also something you have to define. Helper functions like a sorted or its permutation will probably be helpful later in the future because you may be able to reuse them for other purpose. They may even become part of a library.

Hard To Define

Q. Which of these post conditions will be hard to check? The audio level must not exceed 80 decibels, the animated cat should walk like a real cat, under specific atmospheric conditions, the sunset simulation shall produce a green flash, or response time shall be below 10 ms. Over to you.

S. And now for the answer to our quiz. Generally, anything you can measure can also be turned into post condition, and if at some physical condition, then it's typically very easy. The audio level must not exceed 80 db. While assuming we have a way to measure the audio level, which any descent audio software should have, yup--this is easy to turn into a post condition. Turn this into a post condition and you will save millions and millions of ears from damage. Next thing, the animated catch should walk like a real cat. Ah, real cat. If there ever is a specification of what makes a cat a real cat, I would be very, very delighted to see that. I'm pretty sure cats know precisely what to do and what not to do. For everybody else, that remain a mystery. Under specific atmospheric condition, the sunset simulation shall produce a green flash. Lo and behold, that's what a PHD student at my department did. He was specializing in computer graphics and his PHD thesis was perfect sunset simulation, and indeed, in reality under specific conditions, you can have a green flash during a sunset for just a second. I think this has been witnessed by only a few hundred people and only banned on film twice so far, and well having green or some amount of green in an image--it's actually pretty easy to check. So I am pretty sure he also had a post condition that were done and show that he was successful, and finally response time shall be below 10 ms. That's straightforward. Time is something that's easy to measure. This can also become a simple post condition.

Data Invariants

At this point we have seen that writing perfect assertions that cover all bugs and are pretty hard, but this shouldn't keep us from writing assertion that catch as many bugs as possible, and this is particularly useful in debugging. Since assertions are automated, they can check several executions of a function at once--actually all executions. However, they can also check large portions of data at the same time and all automated. What do I mean by large portions of data? Let me illustrate this by an example. Let's assume we want to implement a time class. A time that consists of hours, minutes, and seconds and we want to use assertions to automatically check whether a time object is consistent. That is whether the hours are in the right range, minutes are in the right range, and seconds are also in the right range. Here is our time class--we start with an initializer or a constructor that takes three arguments--the hour, the minute, and the second with default values of 0 for each and we assign these to individual attributes in hours, minutes, and seconds within a time object. Here's a number of inspector method that gives us access to the internal attributes-- hours, minutes, and seconds. After adding inspector methods that gives us the hours, the minutes, and the seconds, we now need a way to print out the time object. For this, Python provides a special method. It is called the repr for representation method. The internal method with two underscores before and after and what this thing does is this method is called whenever an object of the particular class is to be printed, and this returns a string representation of the object. So what does it do, we're using the string format method here, which takes a number of arguments, and format each of these arguments given to the format specification written here in that string. So for instance :2d prints the numerical argument here with two digits-- self minutes also comes in two digits, self seconds also comes in two digits. This format is not perfect yet. We actually wants leading zeros in here and we want hours, minutes, and seconds separated by colons. So here's a string, first the hours two digits leading zero, then the minutes two digits leading zero, and then the seconds two digits leading zero. Let's try out how this works I'm initializing a time object here 13 hours, 0 minutes, 0 seconds. We're using a 24-hour clock here and I'm printing this out. In printing this out should now invoke this repr method, which would automatically provide a nice string representation of our time. Let's run this whole thing and we get nice representation

Incorrect Input

Q. Now, all of these are fine and works well. There's no bug in here, but there's nothing in there which prevents misuse of my class. For instance, nothing prevents me from entering numbers in here which make no sense at all. For instance, I could enter -1 as the hour or -2 as the minutes or -3 as the seconds. This is not a valid time where we would have to come up with entirely new definition of what the valid time is. Whether we can still instantiate such an object and even printed out comes as -1, -2, -3. So here we're still talking three numbers, but since Python does not provide static checking of types-- that is checking of types when the program is being compiled or have been executed. There's noting that prevents me from passing objects which have a completely different type. So I could passed the string in here for instance and then hours would be initialized with the string-- this means that I would get a time object in here, which is entirely invalid. So my situation is--I'm passing some string here in the hours attribute. And now for a quiz, so if I do this, do I get an error when initializing the time object or when printing the time object or never.

S. We can find the answer to this question right within our environment. We first just run the initialization--that is just passing a string in here and we just run and nothing happens, no error message, so this is just fine. It's actually possible to pass a string object here. Now, when we try to print it out and run this whole thing, we get an error message because now what we have is that the format up here is no longer appropriate for an object of type string--that is Python does not know how to format a string with two digits with a leading zero. As you see, as soon as we are printing out the object, we get an error and therefore, the correct answer here is we get an error while printing.

Beware the Time Bomb

What you see here is an instance of a problem that's commonly known as a time bomb. A time bomb is infection in the code that's just waiting to explode into your face as a failure. Here's something that's definitely wrong and it's in there and it's sleeping. It can be there for million of cycles. Only when it's being access or process that's when the time bomb explodes and this is hard to debug because you have to figure out where the time bomb originally was planted and set that's where assertions save the day. An assertion prohibits time bomb by checking whether the data is same at the very moment it is being stored. When we are creating a timedobject, we could for instance use assertions to make sure that the arguments actually are all within specified ranges. For instance, we could say the hour must be between 0 and 23. At the same time, we give a hint to the user that time uses 24-hour format. We also want to make sure that the minutes and the seconds are within the proper ranges between 0 and 59, respectively. This special syntax you see over here is a speciality of Python which allows to coerce multiple comparisons into one. Actually now that I think of it, 59 is not correct here. There's years in which there's leap seconds and if there's a leap second, then there can actually be 61 seconds in a minute. So, in order to be perfectly correct, this needs to be 60 up here. This is still a very simplistic time class, adjusts its use, say local time for instance. We don't care about time zones or calendars or daylight savings time or anything at this point. The real time class is way more complicated than this one. Now, let's see whether these assertions help us in avoiding time bombs. We still have the string passed in here. Let's see whether our assertion catches this. Again, press and run. And see as we put in 2 minutes after midnight as a string, the assertion fails because the string is not between the values of 0 and 23. What happens if it pass negative values in here say hours is valid, minutes is negative, seconds is valid, press on run, and again we get a failing assertion-- now, we get the failing assertion that the minutes is not within the right range, which is exactly what we wanted to catch. With these assertions up here, any attempt to set any time object to an illegal state will immediately be detected.

The Exact Time

P. And again, if you think of a program as a succession of states that passing assertion can immediately tell you that there is large parts of the state where the infection cannot have taken place. If these were the time object in your state, you would know that all of them have correct values because none of their assertions has failed. One problem, however, remains without time class. You can still pass floating point numbers as arguments here and then get an error while we're trying to print the time. So here is your programming exercise, extend the code such that all parameters up here are automatically converted into integers.

S. Welcome back. We have the problem that we can't pass floating point parameters up here. And these were done later, result in errors while printing, so what we want to do is we want to convert them automatically into integers. How do we do that--well, we simply go along and use the Python function int, which automatically creates an integer out of the object that's being passed in here, and in order to make everything real perfect, we also do so in the assertions. And this is the answer to the quiz.

Introducing checkRep

We now have seen how we can check the input parameters through the time class. But what happens if there is more setters than getters which can all compromise the state? Let's assume we have a method advance which takes a number of seconds by which to advance the time. For instance, a value of 3600 would advance the time forward by 1 hour whereas a -60 would set back time by 1 minute. Here's our advance method. I skipped the details of the computation right now. But instead, we focus on the postcondition. We assume we do have a method seconds since midnight which returns the number of seconds that have a lapse in this very day, then we can set up a postcondition which simply said that after we advanced time, then the new second since midnight should be the old value plus the offset of seconds. All of this modulo the number of seconds in a day such that we'd probably take care other advances that go beyond or before midnight. The method seconds since midnight is rather straight forward to implement. So the hours times 3600 plus the minutes times 60 plus the seconds. So again, we use the helper function to help us define our postcondition, but for all of this to work, we need to make sure that even after the complex computation, the state of the time object is still the same. That is minutes, hours, and seconds are still within the right ranges no matter what happens in this complex computation and this is not checked at this point. What we need here is a property that always holds. In Computer Science problems, we call such a thing an invariant. An invariant is a condition that always holds for some data object. What does always means? Always means from the perspective of the user of that object. That is at the beginning and at the end of each public method. While the method is executing, invariants can and frequently must be violated. But when the method is done, invariants should hold again. For a time object, the invariant condition is already encoded here in the argument checker in the precondition of the initializer, but now we want to check this condition over and over again. At the end and at the beginning of each public method. What we do is we write an invariant checker. This invariant checker called checkRep here checks whether the internal representation is same or correct. In our case, we simply set up three assertions to check whether hours, minutes, and seconds are within the right ranges. Once we do have such a checker, we can now use this in all functions at the beginning or at the end as appropriate. For instance, after initialization, we invoke checkRep then make sure whether the internal state is within the right ranges. This is also true for setters like advance. Before we change something, we check whether the invariant is satisfied and we do the same after the computation such that if the computation in someway messes up hours, minutes, or seconds, then checkRep will immediately find this.

Implement checkRep

P. Let's whether you can come up with an invariant check of function yourself. As an example, we're going to use zip code. The zip code is what you need to address a letter to a specific address. The zip code is a number or in some countries a sequence of numbers and letters that identifies a specific region in some country. For our example, we're going to use American zip code, which in their most basic form are five digits, and those five digits go from five zero to five nine that is each of the five digits is between zero and nine. So, here's a zip code class. Again, we have a constructor, which takes the zip code which is a string consisting out of five digits. We set zip, internal zip attribute to the string--again, five digits. We also have a method to retrieve the zip code again as a string and now we need to come up with an appropriate representation checker invariant checker that checks whether the state is okay. We could, for instance, start with making sure that the length of the zip code is precisely five. Now, go and complete the invariant checker such that the content of the string is also checked such that each of the digits is between zero and nine over to you.

S. Here's the answer--what we need to do additionally in checkRep is we need to check whether every single element of the zip code is between 0 and 9, so we set up loop that iterates over the elements and check this for every single element of the string. We could also come up with a regular expression that ensures that we have a sequence of five digits between 0 and 9. I will leave that for the more advanced among you.

Red Black Trees

We have seen that with assertions we can ensure that objects are in the same state all the time. You can imagine that this would be most useful for large, complex data structures. That is, structures with lots and lots of invariance that need to be maintained or that could possibly be violated due to programming errors. Let me illustrate this with one of the most devilish data structures ever invented, which is a Red-Black tree. Here's an example of a Red-Black tree. A Red-Black tree is a data structure to represent associative arrays also known as mappings. If I want to search an element in a Red-Black tree, I can do so in an almost logarithmic time. And I'll do so as in all search trees by checking whether the element I'm searching for is smaller or larger than the element I'm looking at. Suppose I want to check whether the number 22 is in my Red-Black tree. I start at the root, which is 13. I'm looking for 22, which is larger than 13. So I go along the right branch. 22 is larger than 17 so again I go along the right branch. and here I end up exactly in the element I was searching for. Red-Black trees do not only guarantee search in logarithmic time but also insertion and also deletion, which makes them a real nice choice for all sorts of search operations. But they also are very, very difficult to debug. The reason they're difficult to debug is they do have a number of properties that must be maintained at all times. To start with, they always must be trees. You can't have any cycle in it. A black node must have red children, and a red node must have black children. The number of elements in the subtrees should be fairly equal with respect to the number of black nodes, and if some element points to a child the child maintains the pointer then this should point back to the existing parent. So search in Red-Black tree is easy but insertion and removal are really complicated. To get an idea just how complicated they are, let's take a look on Wikipedia what they wrote about Red-Black trees. So this is the Wikipedia page on Red-Black trees. Here's the insertion. These are two helper functions, grandparent and uncle. Now we go to 1, 2, 3 different cases all with individual code that needs to be handled. Here's case number 4, and here's case number 5. Removal is even more complicated. This is the formal description. We need a helper function called the sibling. Here's a helper function to delete one child, which is easy. And now come all the special cases—case 1, case 2, case 3, case 4, case 5, and case 6. And of course, you can imagine how easy it is to make mistakes in any of these cases. It would have some pointer pointing to the wrong node, and again it would easily create lots and lots of time bombs which is why we normally do not implement Red-Black trees but we rather rely on Red-Black trees as implemented in some library. However, if you were to implement a Red-Black tree, let's say part of a programming exercise for instance, it's cool. Or you're working on something that's similar to a Red-Black tree, but which doesn't have yet its page in the Wikipedia. Then you really need to make sure that you don't screw up things and that you don't create a time bomb in here. If you work on a data structure as complex as this, the very first thing you need to write in an invariant checker.

Red Black Trees and Debugging

Here's our invariant checker for the class Red-Black tree. You want to make sure that the root node is black. You can make a mistake. Let's say move the self-tree up here and all of the sudden the root becomes red. This assertion will take care of this. And as you can see here, we're setting up helper functions in order to break down the invariant into manageable small pieces. We want to make sure that there are no cycles in the tree. We want to make sure that pointers to parents are consistent. That is, if some node points to its parent then it should also be a child of this array parent. We want to make sure that in both subtrees the number of black nodes is equal. That's a nice long function name, which says it all. And finally, you want to make sure that red nodes have only black children and at the same time black nodes only have red children. I'm not sure whether we need to make this a separate method. Somehow these checkers are rather trivial. That is, one-liners. Others require a traversal of the anti-tree and would typically call themselves recursively. And now for every single method that change the state, you can now invoke the invariant checker. You do so at the beginning. You do so at the end. And when all of your assertions pass, then you can be sure that your Red-Black tree never ever becomes corrupted. However, if something goes wrong, it would get the error message at the very moment you make the Red-Black tree inconsistent. You will get the precise assertion that failed and you won't implant any time bombs because if the precondition is satisfied and if the postcondition is violated you know that it is precisely this method--the insert method in this case-- that caused the data invariant to be violated. This thing about Red-Black trees actually is a sort of a personal thing for me. Ten years ago, when I was a teaching assistant, my professor had the cool idea to have the students implement Red-Black trees as a programming exercise. And we as teaching assistants, had the cool idea to come up with a real hard automated testing for the student assignments, which turned out a disastrous combination because writing a Red-Black tree is one thing and writing a Red-Black tree that works is another. At this time, I did not really know how to debug and neither did my students and neither did my professor so we as teaching assistants would be sitting with our students for hours and hours and hours in the night stepping and stepping and stepping through the program and try to figure out where on earth on the umpteenth insert operation this particular pointer would finally be moved to this other location, which was a total pain to figure out. In particular with regular debuggers, we typically step over the whole thing and then finally lose the moment so you'd have to restart again and restart again. And if I think about how many hours probably days and weeks I have lost digging through these data structures whereas with a single invariant checker I could've found them in minutes I'm still somewhat sorry.

Invariants in Eiffel

Q. If you find that all these invariant checkers become tedious to write, you may consider a language where invariants, pre and post conditions, are all built in. This is the case in Bertrand Meyer's Eiffel language. In Eiffel, invariants become part of the classes. The invariant also unnamed, so if an invariant is violated, the name of the invariant becomes part of the diagnostic message. Here we have root has no parent and what this does is simply checks where the root attribute is a valid pointer--that is, it's not a void pointer. This is unequal in Eiffel, and this is what another language is called a null pointer. Here we have the invariant root is black and it simply checks the color attribute of the root note to be black and of course, there's more to it. Once you have defined such an invariant, the invariant will be automatically checked. Actually, it will be checked precisely according to the rules we set up previously. Do you still remember them? So when does Eiffel check its invariants? Is it whenever some class attribute is read, whenever some attribute is written, at the beginning of each public method in the class, or is it at the end of each public method? Check all that apply.

S. Okay. Now for the answer. When does Eiffel check its invariants. Well, it does precisely so as we defined before and as we recommended to check the invariants that is at the beginning of each public method and at the end of each public method. If you look at whenever some attribute is read or written, this also happens within the methods of the class and in particular--say you first try to some attribute, then some other attribute, then some other attribute. In between, you may have some inconsistency that will be discovered by the invariant. So while some method of the class is operating including reading and writing attributes, we don't want to check the invariant which is why these two must not be checked.

Assertions are Optional

Now, for a bit of advice on assertions. First of all, assertions take time. If for every single operation on a large data structure, you're to reverse the anti-tree in order to check it for consistency, these operations will be no longer logarithmic time but at least linear time. A check on a square root result will not matter that much, but if you need to traverse large data structures, it will. This is why assertions can be turned off. The way assertions are being turned on and off differs from the language used. In Python, there is a -O option which turns assertions off. -O stands for optimized. In C/C++, the compiler option -DNDEBUG for no debugging also turns assertions off. In Java, there is an option -ea for enable assertions which turns assertions on. Java special by having assertions turned off by default. Not very surprisingly, the assert keyword in Java is the least frequently used of all keywords in Java, which in my view is a big, big shame. The fact that assertions can be turned off and frequently will also implies that that assertions must not change the program semantics. Whether assertions are turned on or turned off, the behavior of the program should be the same. Suppose you're developing this cool map application, which also supports setting pins for individual locations. Now, you want to make sure that if you remove a location, the appropriate function returns the right value so just remove it. And if you're right, assert map.remove location equals to True, in order to make sure that remove location returns the true value that's dangerous because if the assertion is turned off here, then not only does the check go away but actually also the call to remove. Generally speaking, if anything in your assertion has side effects such as this one, don't put it into the assertion. What you do instead is you put the functionality into code that is outside of the assertion and then use the assertion only for checking the result. This way if the assertion is turned off has no effect then the semantics of the program will still be unchanged except that the checks will be gone.

Public Preconditions

Another implication is that you should not use assertions to check for public preconditions that are conditions that your program must check anyway to ensure the integrity of data coming from third party. As an example, if you have let say website or device, myaccount.com, where users can enter arbitrary amounts of data to be processed then do not use an assertion to check for the integrity of this data instead use a check that is not optional and will always remain active. In our example over here is something that you should not write. So here, we get an input, convert this into an integer, stall this, and deposit and then we use an assert to check this external data to make sure the deposit is greater or equal than 0. As soon as I turned the assertion off, this check will no longer be performed, and therefore, it would be possible to say to enter negative deposit numbers. What you should do instead is just a regular check. If you do have some invalid data here that comes from a third party raised an exception, a board execution popped up an error message or do whatever is most appropriate for the situation.

Pros and Cons in Production

Finally, it's always a good question whether assertions, in production code, should remain enabled or not. Here are some points to consider. First, failing is better than bad data. If your program fails with an assertion, you know that there's an error. If it simply produces a bad result without actually checking it, then this may have far worse consequences than failing. Second, the more active assertions there are, the easier it is for you to find the defect. Third, defects in the field are hard to track, which means an assertion gives you a chance to find the defect before it goes into the field, in particular, in combination with the raw testing. and even in the field, if assertion are still enabled, you still have a far better chance to find defects because, well, you get the failure rather than producing bad data. On the detractor side, so what speaks against assertions being amenably production code? We have performance, performance, performance. If you do have a large data structure with lots of global checks and leave this enabled, performance will suffer. However, there is large number of assertions which impact performance only in a marginal way. Checking some result is larger or equal than zero ranges like things. This will not impact performance in a measurable way. The answer here is to first leave the assertions on then measure which assertions actually do impact performance and possibly do turn these off and leave the others enabled. The second thing is asssertions, when they fail, typically caused the program to crash, or at least, abort execution immediately. This is not very user friendly. You should give your users a chance to recover even if an assertion finds that something has gone wrong. But, then again, if your program crashes because of a failing assertion, what's the alternative? The alternative is to keep on working with bad data and bad data can produce arbitrary and even worse results later.

Most Expensive bug in History

You may have heard the story of Ariane 5 rocket which famously exploded on its first flight due to software bug sometimes called the most expensive software bug in history. Do you know why the Ariane 5 exploded? It exploded simply because it had a bug in a 64-bit integer to 16-bit integer conversion. The interesting thing though is that normally the code actually did have plenty of checks to track such illegal conversion. These checks were disabled in the Ariane 5 for reasons of performance. Picture what would have happened if the computer in the Ariane 5 had been just so slightly be more performant to cover this particular check of a 64-bit to a 16-bit integer then the assertion would have triggered even during the launch, but the Ariane 5 actually does have a very good recovery mechanisms for failing assertions. It would have recovered, may be the flight could have done well and we would have escaped the most expensive software bug in history. $317,000,000 went into the sky because of a missing assertion.

Traffic Lights

Q. With all of these considerations, let's imagine we're working on a traffic light system. We have three colors in here, red, amber, and green. Now, we have a little piece of code in here, which impacted the traffic based on the traffic light. Let's assume this is part of some traffic stimulation. So, if the light is red, then the traffic should stop. If the light is amber, then the traffic should prepare to stop. And if the light is green, then the traffic should go. Now, being good programmers, we always assume that something goes wrong if the light is neither red nor amber nor green, which in principle shouldn't happen, which due to some bug could happen. What should be here where the question mark is? I'll give you four choices. First, we print out "This can't happen," on the console in the hope that somebody will eventually see it. Second, we raise an assertion exception. We knew that the program will immediately abort. Third, we write down assert false thereby clearly documenting that this is not expected. Last option, you write down pass, which is a Python statement, which does nothing at all. After all, this bug in the traffic light may well eventually go away at the next cycle when we actually go back into some defined state. Which of these four is the best option to go down here considering the pros and cons about assertion that we see over to you.

S. Thank you very much. Now let's go for the answer. Let us discuss the options one by one. First print this can't happen. Cool. It's going to go to the console. Probably going to go to some log but it is unlikely that anyone would ever notice this, and this means that our traffic light, well in our simulation, will be undefined stage, possibly meaning that the traffic will also be in some undefined state. What happens if we have multiple traffic lights and one of them is neither red nor yellow nor green? This is going to cause lots and lots of interesting issues. So simply printing this out to the console is not enough. This problem is too hard. We are violating the data invariant here, which also rules out the last option. That's not a good option. Now for the two remaining choices and this is a tough choice. We could have the assertion in here, assert fault, which would be nice but keep in mind that this could actually also be turned off-- for instance, when invoking the Python interpreter with -0. However, in this particular situation, there's no reason to turn this assertion off because well, if it gets executed then it raises an exception immediately and if it does not get executed then everything is fine. We mean that by turning it off, we don't get any savings. So the better option is simply to raise assertion exception plain and sample and thereby to document that something is going really wrong in the program. This will not hurt performance at all obviously because it will only be triggered when all other checks already have failed. However, if these were the production program, you may wish to come up with some global try catch block in the program or in Python this would be a global try except block that would catch all assertion exceptions and then pop up a dialogue saying "Oh we encountered a fatal error" possibly offering some recovery mechanism or some ability to save the progress that was we've seen so far such that the user is not confronted simply with all of his work disappearing.

System Invariants

Sometimes during debugging, there are situations where you want to ensure that some property hold throughout the entire execution. Let me give you an example for that. Suppose you have a very big Python program with dozens or even hundreds of modules. We suppose that due to some designed decision, there is one global variable called flag which is accessible from all these modules and all of these modules could possibly change the value of flag; the read flag, the write in to flag. But now, you have a run in which flag should not be set and yet it is set. And you have all these places in the program where the variable could be set, but you don't know where this is. What you want to have in this situation is a check that is executed at every moment throughout the execution of the program and which monitors the moment in which flag is set to true. From a previous unit, you may remember the tracing functionality in Python. By using the sys.settrace function, you can specify a function. Here, trace it, that it will be executed after every single line is executed in your program. And we can set up this trace it function such that after every line, it checks the status of the global variable flag and aborts execution as soon as flag becomes true. As any such tracing function, it returns itself such that it will be executed with the next line. As soon as you have such as checker in place, the flag will be checked after every single line of execution and you will be able to monitor exactly where the flag is set. And this of course is true for arbitrary data structures. If you do have some data structures say store, again, which can be accessed from lots and lots of places, then a call say to a store.checkRep will ensure that store is consistent for every single line that's executed. Just be sure to turn this out while store is executing its own methods because while these internal methods are running, store would naturally violate its own internal data invariants.

System Memory in C

In Python having arbitrary read and write accesses all throughout the program towards a single location is rather uncommon. There are languages; however, where any function that execute can access any part of the memory on purpose and sometimes by accident. This is particular true for languages in which memory is under control by the programmer. Languages such as C and C++. To illustrate why languages such as C and C++, memory management can be a huge problem. Let me show you a short piece of C code that can cause lots and lots of trouble. In C and also C++, you obtain memory by calling a special function which is called malloc for memory allocator. When you invoke malloc, it gives us an argument the number of bytes that you'd like to have for your data structure. Strictly speaking, it's the number of characters, but let's skip that. The effect of this statement is that you'll be getting a chunk of bytes and you will get a pointer that points to the beginning of this allocated memory. What you actually have here is an array of 10 characters. You can write these characters one by one. For instance, in location no. 5, let's store an x. In C, as an other languages, the array start with an index of 0. So 5 is actually the sixth element in the array. We can also read elements from the array. For instance, we can access the 10th element of the array and store the resulting character in y. The only problem is there is no 10th character. The array has only 10 elements and this would be the 11th element. The behavior of the C program at this moment becomes undefined. That is anything can happen. It is possible that some completely random value is stored in y. This is actually the most likely outcome. It is also possible that the program immediately stops. It is also possible that your program all of a sudden transforms into an adventure game and allows you to explore the colossal caves. That's a very unlikely outcome but keep in mind that in one release of the GLU Compiler, the GLU programmers had some fun and stating-- "Well, in undefined behavior, anything can happen, so let's simply go and make this a game." This was not very appreciated by the programmers.

Buffer Overflows

Needless to say, this behavior of C and C++ programs opens the door for many many ways of abusing the system. You may have heard buffer overflows, which exploit precisely this flow in C and C++ where people not only read but write beyond the elements of an array in order to supplant malicious code and select locations of the memory. This opens the door for all sorts of interesting hacks, of course. How can one detect such errors? What we need is a system invariant that continuously checks the boundaries of an array against reads and writes. What a tool can do for instance is constantly monitor the uninitialized areas for reads and writes with every single instructions that is. And whenever the program tries to access some system memory that is not allocated, what will happen is that the invariant checker raises an exception or otherwise aborts the program and therefore allows us to detect this kind of error. Tools for C and C++ when you do that include tools like electric fence, which is precisely that places these blocks in front and before every allocated block and therefore detects when reads and writes happened outside of these allocated areas. And the second important tool here is Valgrind, which actually is an interpreter for x86 binaries in which also allows us to monitor accesses to non-initialized code for C and C++ programs.

Day of Week Bug

Q. Buffer overflows are also the topic of one of my most favorite box story which was reported by Isenstadt in '97. He once had a system that would work only under very specific circumstances, sometimes it would, sometimes it wouldn't. The crucial parts of memory in this story are two variables, one is the day of the week and the other one is a variable which we will simply call ok. Ok can take a value which is either yes or no. By default in this story, ok would be set to no. However, ok has to be set to yes in order to work. Now, the day of the week in this eight character array would be something say like Monday, and on Monday, the program would not work. It also would not work on Sunday. It wouldn't work on Tuesdays either but there would be one day of the week and only on this day of the week where the program would work just fine. What is this day over to you. S. ???

First Steps

Q. Assume you have this big, big program with no random checks at all and you suppose to do the debugging. Where in that program should you start? I would suggest the first thing to do is to define data invariant. This will immediately cover large parts of the program state and catch lots and lots of defect. The next thing is to provide preconditions which check the data invariants, of course, but which also check specific preconditions for the functions at hand. Finally, provide post condition in any method you find suspect. Start with the partial conditions and then you expand them further and further to capture more and more of the correct behavior. Why do we start with data invariants in preconditions? Well, because they usually way easier to write, catch lots of bugs, and because we only care about whether a method works or not if it actually gets the correct argument and gets a correct state to begin with. On top of that, if you are using C or C++, run a system invariant check up. You should run the system invariant check up for the simple reason that it will check for all sorts of memory corruption, and if your program does have any issues with memory corruption, then all of these other assertions are totally mood because they will come up with random results. Running a two like Valgrind can detect lots and lots of memory issues and all it takes is to run your program once with Valgrind enabled. A colleague of mine recently transfer from academia to an oil and gas company and he was in charge of testing. He introduced the first assertion ever in their code and immediately, this one single assertion, uncovered dozens of bugs. The engineers were amazed. They have never seen anything like this before and this is an experience. Well, I'm not exactly sure whether you should have that experience too, but if you come along with code that has no assertion at all, start adding some and you will be surprised. Why should we start with data invariants? This is a quiz. First option, they cover much of the state. Second option, they are frequently checked. They form implicit pre- and post conditions because data invariants should hold at the beginning and at the end of each public method. Final option, they provide helpful documentation because they document exactly how the data structure is organized in which assumptions program should not violate. Check all that apply. Over to you.

S. Here comes the answer. Well, that actually was a rather boring quiz. Of course data invariants, if it's large data structures, cover much of the space. If these are data structures that are frequently used, they're frequently checked with every single use. Data invariants make up an important part of all sorts of pre- and postconditions. Finally, the assertions you write down here provide helpful documentation for programmers.

Inferring Invariants

Coming up with proper precondition, postcondition, and invariant can be a hard task. Other tools that help us to do that-- this is the idea of inferring invariants, having a tool that automatically provides data invariants, preconditions, and postconditions. I'm going to show you how such tools work, and we're going to explore how to build one. The Daikon tool by Michael Ernst and colleagues is a tool that dynamically detects invariants from program run. The idea is that you do have a set of executions. Here is one execution. Here is another one, and here is five executions. What Diakon does is it analyses these runs and checks whether there are any properties or variables that hold for all oberved runs. For instance, it could determine that the variable x is odd whenever the function f is being called. How does Daikon do that? The first thing it does is it gets traces. A trace is a listing of all functions that were called and all values of all variables. Very much like the sys.settrace() function in Python, which we have used to trace the programs. What Daikon has built in is a so-called pattern library. A pattern library of possible invariants. Here is such a pattern--$1 == 0. Here is such a pattern--we have a place holder which equals 0. Daikon now takes the trace, looks at all the variables, and checks which variable satisfies this pattern. That is it replaces the placeholder with every single variable found in the trace. It checks whether x == 0, whether y == 0, whether z == 0, and so on. Only those patterns that match are being retained. Those that don't match art eliminated. X and y are not 0. Then they're eliminated. This one is retained. Daikon checks these patterns for every single invocation of function and only retains those that hold for all invocations of the function, which means that over time the set of instantiated patterns becomes smaller and smaller. It's like a sieve. At the end, if this instantiation is found to hold for all invocations of a function then it's actually retained and finally reported as an invariant. Daikon has hundreds of these patterns. It tries them all out one by one on all variables at all invocations. Yes, this takes a bit of time. If, by applying the pattern library, Diakon has found out that x is always odd when f is being called, it reports this as an invariant.

Automated Inferring

Q. Let me show you how this works on an example. Here is a square root function. It takes x and returns the square root. Let's assume we invoke this with the values 2, 4, and 16. When we invoke the square root with the value of 2, we could infer that x has a value of 2 and eps has a value of 10^-7. However, these patterns would also be instantiated: x is smaller or equal than two, and x is greater or equal than 2, because these patterns also hold for the values that we observed. In the next iteration, we invoke the square root with the number of four. Now, the invariant of x always being 2 is eliminated. What we get now, however, is that x being less or equal than 4 still holds. We can do so by merging the earlier invariant with a new one. X greater or equal than 2 still holds for the new value. When we invoke the square root of 16, we now retain the invariant that x is less or equal than 16 and greater or equal than 2. This is what we get in the end. X is between 2 and 16, and eps is always 10^-7. For the postcondition, we get similar ranges for the return value. The return value is between the square root of 2 and 4, which is the square root of 16. However, what we also get is that the return value squared is equal to x, and we get this because Daikon has an appropriate pattern for that, namely a pattern where the multiplication of any two variables equals a third variable. This is instantiated with a return value, again with a return value, and with x, and this pattern then holds for all runs-- at least for all runs with integer numbers. If we put in floating point numbers, then eps also comes into play, because of rounding errors, and then this pattern would no longer be discovered. Whatever Daikon can produce is constrained to the pattern library it has, but if you add more patterns, then you'll be able to discover more properties, which will take Daikon a bit longer, though, to discover them. Still, even with a perfect set of patterns, approaches like these are dependent on the actual numbers that are being fed in there. What Daikon produces is relevant for all the runs observed, but we all know that the real precondition for the square root does not have specific range constraints on x except that x should be greater or equal than 0. Likewise, the return value of square root is not necessarily between the square root of 2 and the square root of 16, but it can actually be anything that's, again, greater than 0. Tools--for dynamic inference of invariants can work well if they do have a good test suite in the beginning. How can we get the correct ranges for x and the return value? By invoking square root with a value of 0? By invoking square root with a value of 1? By invoking square root with a value of maxint, where maxint is the highest available integer? Or by invoking square root with a negative value? Hint: You need multiple invocations. Check those which you need to get the correct ranges. Over to you.

S. Thank you very much, and now for the answer. The answer is if we want to have correct ranges for the input variables as well as for the output variables, we need to provide these ranges when calling square root. In our case, the correct range for x is 0 to maxint. If we invoke square root with 0, and if we invoke square root with the maximum integer, what we're going to get is that x is greater or equal than 0 and x is less or equal than maxint, which is actually the correct range. Likewise, we've learned that the return value is always greater or equal than 0, and it's always less or equal than the square root of maxint. If we invoke the square root with a value of 1, this won't change a lot. We will slightly expand the range, but we won't get the 0 in here. This is actually not needed. If we invoke square root with -1, then we will actually violate the implicit precondition. Let's hope that square root then will actually fail such that Daikon can then deduce that if we invoke square root with a negative number it fails. These two are the correct values.