cs259 »

Back to Course Wiki | Intro Lesson

CS259 Lesson1



Welcome to the first lesson on how debuggers work.

In this unit, I'm going to introduce you to the scientific method of debugging, which is a process by which through a systematic experiment, you'll slowly get guided towards the cause of the problem.

We're also going to see why talking to a teddy bear can be an effective method of finding out what's going wrong, and in the end, we're going to build an interactive debugger--enjoy.

Example Program

Every debugging session starts with the program that failed in which we need to debug.

For this purpose, let's build a simple program that we're going to use as an ongoing example. The idea here is to write a function that takes an HTML input, that is text together with HTML markup and returns just the text. For instance, if the input is HTML markup for bold followed by text foo followed by HTML markup for end of bold, then we want to return just the text. Anything that is within this angle brackets should be stripped away.

How do we deal with this? A simple way is to process the HTML input character by character and distinguish to modes. When we are in tag mode, we ignore all input. When we are in non-tag mode, we add all input to the output and we switch between these modes, tag and non-tag, by looking at these angle brackets in here. When we see the beginning of an HTML markup as you can see in this less than sign, we enter tag mode. When we see the end of an HTML markup that is a greater sign, we exit tag mode.

We can describe the behavior of our function as a finite state machine with two states non-tag mode that is the processing text and tag mode that is we're ignoring HTML markup. When we see a less than sign, we go into tag mode. When we see a greater than sign, we exit tag mode. For all other inputs, we stay in the same state. When we are in non-tag mode and see any character that's not the beginning of an HTML markup, we add this character to the output. Whereas in tag mode, we simply ignore the HTML markup that we processed in here.

So when we are processing this HTML input, initially we're in non-tag mode. Now, we see the beginning of an HTML markup, we go into tag mode. We see the B, which is not the end of a tag then we get the end of the tag, go back into non-tag mode, see the F add this, see the O add this, see the O, add this again. So our output now is foo and now again we see the beginning of a tag where the process or more specifically where we ignore all characters up to the closing tag and then our output is indeed the text inside the HTML markup.

Example Code

If you've taken a Udacity class before, you've probably seen the Udacity IDE--well, IDE. It's a webpage where you can enter arbitrary Python functions and Python commands and execute them. For instance, we can have it say hello for some wrong and then we get the output hello.

Let us now write a function which implements the finite state machine we just seen in order to remove HTML markup.

We have the tag variable, which tells us which state we're in, and we have an out variable, which tells us what the output would be.

Here's an implementation of the finite state machine. If we see the beginning of an HTML tag, we set the tag variable to two, so we go into tag mode. If we see the end of HTML markup, we set tag to false, so we exit tag mode. Otherwise, for all other characters, we add them to the output unless we're in tag mode. And finally, we return the output.

Here's the input we just saw. We have a bold HTML tag, we have a foo as text, and we have an end-of-bold HTML markup in here.

If all works well, then we should get just foo as the output and all the HTML markup should be removed. So we take on run and we get foo as the output--that is I removed HTML markup function, has properly stripped the bold and unbold markup from the input.

Is It Perfect?

Q. And now for a quiz--which of these inputs is not properly stripped of HTML marker-- is it foo, is it a link to a page named foo.html followed by the text foo and the end of the link, is it an empty link, or is it a link to a page named greater than.

Check all that apply.

S. If you remember our program works going through the individual states of the finite state machine, you will find that this input is not processed correctly. What we have in here in our input is the character that can also be mistaken for the end of tag in HTML markup.

A program does not know about the special meaning of double quotes, so what it does is-- it starts interpreting everything as HTML but only up to the closing tag. The double quote in here is interpreted as text input. The greater than sign is ignored even in a non-tag mode. The text is doing just fine. And the remaining HTML markup is ignored. So what we should get as output is double quote foo and you can see that the output still contains part of the original HTML markup.

So this is the correct answer. The others all work fine. Let me demonstrate this in the IDE. Since I'm now having double quotes in my script, I use single quotes in Python, which I can also use to delimit a string.

Rule of thumb--if your string contains double quotes, use single quotes as string delimiters. If your string contains single quotes, use double quotes as delimiters. If there's no quotes in your string, well feel free whatever you like.

Here's our input with greater than and double quotes, we run the whole thing, and we see that the output indeed contains part of the HTML markup that is the double quote is still in there.

A First Bug

The problem we're having here is that we're not taking care of the quotes in the HTML markup. Actually, anything that's within these quotes in HTML markup should not end HTML markup. In particular, not the greater sign in here, so we need to extend a program appropriately.

The idea was as follows--rather than having two states, one for regular text and one for HTML markup, we're going to have a third state, which would handle anything that's within quotes.

Just as before, we start in a non-tag mode and when we see a less than sign, we go into tag mode. This is what we do when we possess this very input and we stay in this mode until we find a quote-- that's when we go in quote mode and we stay in quote mode until we find another quote. That's when we go back into tag mode, which happens right here. When we see the greater sign, we exit tag mode and we possess the individual characters, and we add them to the output. Thus our output should now become foo as expected. The remaining tag is processed just as before.

In order to implement this, we're going to use two variables-- one variable tag and one variable quote to indicate the three different states a program can be.

First Bugfix

Let's go and extend the Python code appropriately such that we now have three states instead of just two. So we're introducing a variable named quote, which initially is false, because in the beginning they were non-tag mode. And now, we simply check for double quote or a single quote, and if we see any of these, the codes variable gets inverted. And now, if quote is set, then the HTML markup characters should have no effect. So we make sure that the HTML markup characters only have effect if quote is set. Finally, we need to make sure that quotes only have a special meaning within tags, so we make this endtag instead.

We still have our example down here, which previously failed, because of output quote foo. Let us see where this output now is properly handled and we click on run. The output is foo. All the HTML markup--you can read this complex combination of quotes and end of markup characters is now properly removed--great.

A Second Bug

Q. And now for a quiz. There is some input which still produces output with HTML markup. So we still had the bug in this code. Is this foo in bold HTML markup, is it same as above but with foo in double quotes, or is it the same as the first but with the quotes outside of the text, or same as the first one but with the HTML tag names enclosed in double quotes. Try it out for yourself.

Which of these still includes HTML markup in the output of remove HTML markup.

S. And now for the answer. This is something that we can best find out by testing. Here's our first input, here comes the second, number 3 with the quotes outside, and number 4 with the quotes inside of the text.

For the first input, the output should be just the text foo.

For the second one, it should be the same but enclosed in double quotes as is the same for the third one.

For the fourth one, we'd simply expect the text foo.

Let's go and test this. Surprise, surprise. The output is very different from what we expected. To start with, the third output still has HTML markup, but for the second input has also something fishy going on because the quotes that were actually part of the text have been removed. In our quiz, we only cared for the HTML markup, so the third input definitely produces HTML markup. And therefore, this is the correct answer.

At this point our program is supposed to be complete, but it still has a bug. If we put in foo in bold, in quotes as input, what would we expect is an output of foo? However, what we get is a completely different output. What we get is an output where the quotes are stripped but the HTML tags are still there and obviously, the output does not match what we actually expect it--we do have a bug in here.

So here again, here is our program and here's the one single input that goes wrong.

If you press on run, what we get is the quotes removed but the HTML markup is still there. When I was a student, I never got any formal training in debugging, so I had now to figure out myself what went wrong. The only thing I knew was how to use debugging output, so I would use the print statement in Python in order to figure out what had gone wrong. Essentially, I would go and scatter print statements everywhere. For instance, I can go and print all the local variables in here. Cool. So I know what the character is, I know what the status of tag and quote is.

Now, we'll click on run--oh yeah, and here comes my big output. Now, I can scroll down here and say, "Oh yeah, sure, sure, sure." Obviously, of course now you can immediately see what's going on in here. No, I'm afraid I can't because this is just a long, long list.

Think of a 1000 character input at this time. If you have a 1000 character input, you didn't have to go through 3000 lines of logs. This may help you but it's a total time waster. You have to enter these statements, use them for debugging and then remove them again. It is a total maintenance nightmare.

Security Nightmare

Debugging statements left in the code may even come up with a security problem. In Mac OS versions 10.7.2/10.7.3, there was a security issue because a programmer had left debugging print statements in the code.

This would result in the following situation--you as a user would enter your password into the Mac the Mac would then let you in or not let you in but at the same time would store your password in the clear in, a log that would be visible for anyone-- anyone with access to the machine of course, and this again would require a password.

On a multi-user machine, for instance, or somebody having physical access to a hard drive, a mean attacker could now go and retrieve your password in the clear, and that's simply because of some left debugging statement in there.

Do Not Debug Like That

Not only would I start using arbitrary debugging statements everywhere, I would also try to debug the program into existence. Just change things until they work. Let me try. The error of this may have something to do with quotes. So don't just simply go and remove all these extra quote checks in here. Let me see whether this makes any difference. Okay. I removed all the quote checks.

Let's see whether it works right now, and we run the program and see--it's foo. The output is foo. It's almost correct. Only the quotes are missing.

Well, now we can handle the quote--maybe we will just remove this in order to align in here, and may be we can change things until things are proper.

Let me see whether this works. Same. Oh yeah, great. Let me see. Okay.

And I can run the thing again and--however, I think I removed too much at this point because if you recall our original example--the one with the greater than sign here in the target URL that we now still get this error in here, the quote is in here which should belong here, which is why we introduce quote handling in the first place.

So now actually, maybe I should go back to this earlier version, but how do I get back to the earlier version. Well, of course, I did never backup my earlier version. So I don't even know what the earlier version was. This more or less was my state of debugging. This is how I debug as a student, which was not the perfect way to do it.

The Devil's Guide to Debugging

All of these strategies come from a chapter named "The Devil's Guide To Debugging" in Code Complete, a book written by Steve McConnell in 1993 which encompasses all the wisdom about programming that he had at that time. (There is a second edition of Code Complete from 2004).

What we've seen so far is just three of these rules.

  • Scatter output statements everywhere in the code.
  • Debug the program into existence. Just keep on adding statements and removing statements until it works.
  • Never ever backup earlier versions of your code.
  • Don't bother understanding what the program should do.
  • (My all time favorite). Use the most obvious fix. That means fix the symptom instead of the problem.

Most Obvious Fix

This technique of using the most obvious fix, so beautiful--I'd really like to show this to you. So here we have a complete program again. I managed to restore each from previous backup. And again, if we run this, you'll see that the HTML markup is not removed.

So the technique of using the most obvious fix is very simple. We simply check for the input that's wrong and return the correct output. Lo and behold, this technique beautifully works and returns the correct output. Rumor has it that some programmers use this very technique to make their unit test work.

Before Fix

Q. As with any devil's guide, you should of course not do what's in here, but you should do the exact opposite. This means you should go and fix the problem not the symptom, understand what your program should do, and finally to proceed systematically.

Let me come up with a quiz here--before you apply fix, what is it you should do-- understand what the problem is, understand what the code should do, or/and be able to predict how the fix would address the problem. Over to you.

S. Indeed, the answer is that you should understand what the problem is, because if you don't understand what the problem is, how would you be able to fix it.

Understand what the code should do because again if you don't know what the code should do, you won't know how to fix it.

And finally, also apply a fix that addresses the problem and not only the symptom but the actual cause.

Defect vs Bug

Now let's go and proceed a bit systematically. The situation we are in is classical. We do have a program and the program gets some input and out pops a failure that can be seen by the user.

What is a failure in here? A failure is an error. An error that's externally visible that is visible by the user.

What is an error? An error is something that deviates from what's correct, right, or truth. Errors are typically unwanted and unintended. Failures are error. A defect is also an error. A defect, however, is somewhere in the code.

The defect is what turns otherwise valid input in the end into a failure. The term defect is an error in the code as in several synonyms. Some people also called this a fault but then a fault can also be applied to data. The by far most common word for defect, however, is of course, bug.

There is a bug in my code somewhere as if this had somehow crawled in into my code. Here is this foreign but, this is my code, it doesn't belong here.

That's okay. In the end, it's us programmers who are actually introducing these bugs in there and so are very much prefer the term defect.

The aim of debugging is to find the defect that causes the problem. For this, I want you to look a little bit deeper into how a program actually executes.

How Failures Come to Be

We can see the program has a succession of program states. Each program states consists of several variables with values. As a program executes, it processes these states and transforms them into a new states. For instance, by reading variables and writing variables. This is the normal mode of operation.

Now, however, since in the beginning, we have a normal input and in the end we have a failure, there must be a defect somewhere in our program that actually causes the problem. So let me assume that this statement we're executing here actually has a defect. What happens is that now, when executed, it introduces an error in the program state which we call an infection. This infection is now being propagated possibly to other state and eventually becomes visible as a failure towards the user. What we get in here is actually an entire cause-effect chain.

You see these failures which is an infection is caused by earlier infection and if we are at a state where the infection has no further origin that is input is state the same, and the output is infected, and we know the statement that was executed at this precise moment which caused this transition from the same state to infected state, this is the statement which caused the infection, this is the statement which has the defect.

When we're debugging now, we need to identify this cause-effect chain not only do we need to identify but we also need to break the cause-effect chain. If we can break this cause-effect chain from defect to failure, then we're done with debugging.

So all of this looks very simple; however, in real life, it's much more complicated than that. To start with, not every defect automatically causes a failure. It may well be that the defect causes an infection which later simply is not propagated as a real life infection just as well. So the infection is not propagated and never ever becomes visible to the user. It may not even cause a failure at all or the statement with the defect may not even be executed or only under very specific circumstances may actually cause an infection and later a failure. This is the problem of testing. You can execute a program again and again. Never have a failure and still have a defect in there; however, if a program fails, that is if we actually see a failure, then we can always trace it back to the defect that causes it.

So if there's a failure, we can always fix it by following back the cause-effect chain. But then the next problem is these states are huge.

So over here we have 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12 variables. Cute. In reality, we have 10,000 of such variables and not only do we have 10,000 of such variables, we also have 10,000 of steps between defect and failure. So tracing back the cause-effect chain can be much, much more complicated that it is in this simple picture.

The longer the cause-effect chain, that is the longer the time we have to cover. The more states we have to cover, the harder is to debug it, and also the larger the state, the more we have to search for an infection. Again, this makes debugging harder and harder.

It's like finding a needle in a haystack except that the haystack sometimes is larger than any haystack you'll ever find on earth.

Infections and Failures

Q. ??? S. Here's the answer to our tricky word quiz. Let's go through these answers one by one. First, every infection can be traced back to defect that causes it. This is actually correct because if we have an infection if we have an error in the program state that is, we can find out which defect causes it namely the piece in the program which got the same state as input and which produced an erroneous state of infection as output.

Second, every execution of a defect causes an infection. That's actually not the case. It may will be that there's a defect in the code, which operates only under certain circumstances, which works fine most of the time, and only under certain circumstances produces an infection. So this is not correct. Every infection ends in a failure. If only it were so, then it would be far easier to make a program defect free. Note, like real infections, they can die out before they ever, ever cause real harm.

Finally, every defect can cause a failure. How? That's tricky. Remember a defect is an error in the code, failure is an error in the execution and infection is an error in the state. A defect is an error in the code and you can have an error in the code even in code that's never executed or in code that's actually unreachable. How can be it be an error if it can never be executed. Well, it may well be that over time when the program is maintained, the error actually becomes executable and then it can actually cause a failure. It is not the case that every defect always can cause a failure because defects cannot be an unreachable code and therefore, they may not cause a failure. So this is not correct just as well.

Isaac Newton and Apples

Now that we know how failure is going to be, let's look into how to systematically find our causes. What we need is a way to organize the debugging process and for this, we use the process called the scientific method. That's a cool name isn't it. It's actually a big name for little thing but this little thing again has big consequences.

What the scientific method does is provide a systematic process by turning some aspect of reality into a theory that explains how this aspect of reality works, and that makes exact predictions on this reality.

Let me come up with a simple example of how the scientific method works. Suppose your a human--say, Isaac Newton, your laying in your orchard and all of a sudden your watching an apple fall down. So now you wonder, well I've just seen this apple falling. This is an aspect of reality. Why do apples fall down?

What the scientific method does is provides you a way to come up with the most general prediction method available. So you can think that if an apple falls and you can repeat this experiment over and over by letting an apple fall, you may also think about well do bricks fall. Here's a brick and lo and behold, it falls. Do plates fall? Oh yes they do. Do cups fall down too? Oh yes they do. Put yourself in the mind of a 3 year old and you will find that everything falls down, and everything really means everything. So I can easily generalized that everything falls down. Unless you happen to see a say--a balloon. Here's the balloon and the lo and behold the balloon goes up. So not everything falls down, but you can easily extend your theory to say--well a balloon is lighter than air. If the air falls down, then the balloon must go up. So still everything falls down. What does down actually means? Down means toward the center of the earth. Everything is attracted towards the center of the earth. Well, the sun is not attracted to the center of the earth. It's rather the other way around.

So you see that from these observations and even experiments that you can conduct, you will eventually be able to come up with a full pledge theory that explains where things will be moving, that explains at which velocity they will be moving, which explains where the acceleration will be.

In the case of Isaac Newton, then became the theory of gravitation.

The Scientific Method

In the beginning, you do have some initial observation.

From this initial observation, we derive a hypothesis. A possible explanation for what you just observed.

If the hypothesis explains your observation, the hypothesis must be useful to make a prediction. In our example, the initial observation was the apple falling. The hypothesis would be, well every solid objects falls down and then a prediction would be, if I take this plate and let it fall, it's going to go straight down towards the floor. It may also break along the way. This is about 3 years old.

The experiment means to actually verify the prediction so we take the plate and let it fall. From the experiment comes the observation. What do you see? Well, the plate actually was attracted towards the ground and so you can now go and check whether what you just observed in your experiment is as predicted and therefore satisfy the hypothesis and we say it supports the hypothesis and we then can refine the hypothesis. Our case line says, we can generalized it towards other objects and to come up with a more general hypothesis.

If the observation, however, is not in line with your prediction, then the hypothesis is rejected, and if the hypothesis is rejected, you need to come up with a new hypothesis, and this cycle going along from hypothesis through prediction through experiment through observation and gradually we're finding the hypothesis or coming up with alternatives.

This is something you repeat again and again until your hypothesis becomes consistent with all observations and has so much predictive power that it becomes what one calls a theory.

In a theory you repeat the process of refining the hypothesis through prediction, experiment, observation or creating new hypothesis--again going through the circle again and again and again until your hypothesis becomes a theory that is a predictive and comprehensive description of some aspect of reality.

What is a Theory

Q. What is a theory — is a theory a vague guess, is it a framework that explains and predicts observation, is it a particularly useful hypothesis, or is it the outcome of the scientific method.

Check all that apply.

S. Now, for the answers. A theory is the outcome of the scientific method. A theory is also particularly useful hypothesis because we refined the hypothesis until it becomes a theory.

And it also is a framework that explains and predicts observations. This is the official definition of theory.

A theory, however, is not a vague guess when you're talking about the scientific meaning of this theory.

When you have scientist speaking about theory, they're talking about the framework that is consistent with all earlier observation and predicts lots and lots of future observations and its actually the best framework in most cases.

When you have layman speaking about theory, it's more like I have this theory on how things could be which does not necessarily have the same strength as scientific theory. The misunderstanding between theory in the sense of a layman using it and with the theory as scientist use it is the cause for lots of confusion in political debates.

Think about the theory of evolution. For scientist, the theory of evolution is exactly exponential and prediction of lots and lots of observations. A very consistent and form many beautiful framework. For others, oh this is just a theory of evolution disseminating this as vague guess. Be careful when you heard the word theory. Its meaning depends very much on whoever says it. In our context, of course, we're using a theory in order to explain what the cause of a failure is and in our context we're not looking for vague guesses, we're looking for something that explains all earlier observations and predicts the future one.

Bug as Natural Phenomena

When we are debugging, we proceed the very same way. Indeed, we're treating bugs as if they were natural phenomenon.

So, our hypothesis would be grounded on the initial observation of the failure and possible also all the knowledge we do have about our program. The failing run, more runs, the original failure, program code-- all of these goes into our forming a hypothesis.

At the end of the process, we get a consistent description of how the failure came to be. We don't call this a theory. Theory is a bit far fetched for a simple failure. Instead, we call this a diagnosis of the original failure.

It may sound a bit far-fetched to apply a method that has been devised for studying natural phenomena and for coming up with theories about natural phenomena on something as artificial as bugs. However, errors and nature have something in common, none of them are under our control. And the scientific method is precisely the method which you need for explaining something that is not under our control.

Apply The Method

Let's now apply the scientific method to a Python program which removes HTML markup. First of all, let us write down again what we have observed so far, what we expected, and what the output was.

If our input was foo in HTML markup, then we would expect the output foo, and the actual output was foo, so this is just fine. If our input was the same thing in double quotes, we would expect the double quotes also to appear in the output, but instead we get the HTML markup still included in the output that is the whole thing fails because of all the observations we made and from this, we need to come up with the first hypothesis on what makes the error.

Q. So, here a quiz. Which hypotheses are consistent with our observations so far?

Check all that apply.

Is it that double quotes are stripped from tagged input, is it that tags in double quotes are not stripped, is it that the tag for bold is always stripped from the input, or is it that four-letter word are stripped.

S. Let's check our individual hypothesis.

So double quotes are always stripped from tag input. Well, here's double quotes. They're stripped. This is correct.

Next, tags in double quotes are not stripped. Well, if we have tags over here, they are stripped. If they are in double quotes, they are not stripped. So this is correct as well.

The tag bold is always stripped from the input. Not the case. Let's see over here. So not correct.

And four-letter words are stripped--well, actually we don't know this because we don't have a four-letter word in here, but at this point, there's no reason to believe that four-letter words would be stripped.

You can still try this out yourself. What we have now is two hypothesis which are consistent with our observations so far. These maybe two separate issues, but chances are that they are actually tied to each other. Let's focus on the first hypothesis because it's simple.

Hypothesis I

If this is our hypothesis, we now must devise an experiment to further refine this hypothesis.

Let's come up with a very simple input where we'd assume that this hold. If we put in just "foo" without any tags, we would assume that we would get the very same output. Let's do this as an experiment. Here again, we have a buggy function and now let's conduct the experiment. So we invoke print remove_html_markup with "foo". We press run and the output is foo without double quotes.

So now we see the output is not what we expected. The output will be foo without quotes and this again confirms our hypothesis. We can try this out with even more strains, further strengthening our hypothesis. So if we put in bar not very surprising what's going to happen we get bar as output and if we put in just the two quotes what we get is an empty string so we get more failures down the way.

This hypothesis is confirmed now by a number of experiments: double quotes are stripped from tagged input, but actually we didn't even have any tags so we can actually scratch that. Double quotes are stripped from general input. So even if there's no tags quotes are being stripped away.

Now we can actually go and explore the cause-effect chain. If quotes have been stripped away, there must be a place in the code which does that. The only place in our code where quotes are handled right here in this line and we have if there's a double quote or a single quote and tag is being set then quote should be inverted and when quote is inverted, tags should not be recognized. Nothing to complicate, but why would the quotes be stripped because normally we should not be in tag mode?

Maybe we are in tag? Maybe the variable tag is set? So we can come up with a new hypothesis that explains precisely that.

Hello Assert

Q. The error is due to the variable tag being set. How do we know that this variable is being set?

Let me introduce you to one of the most powerful debugging tools ever invented which is the assert statement.

The statement assert followed by a condition, evaluates the condition and aborts the execution raising an exception if condition is false--that is if the condition holds, we proceed as usual. If the condition does not hold, we throw an exception.

With the statement, we can now go and check the value of tag all through the loop. So again we say, in our hypothesis tag is being set and we use assert to check that. With this statement assert not tag should tag ever be set will we immediately get an exception and again we can check this with foo enclosed in double quotes. So in order to confirm the hypothesis, we would expect an assert exception. What is the output be then? Let's make this a quiz.

Now that we change the program to include assert not tag, what's going to happen? Does the program raise an exception or is the case that the output is still foo as before. The assertion is not violated and tag is not set during the entire loop. Over to you.

S. Here's our program--the assertion included with our input as before and if tag should be set somewhere during the execution of this program, then the assertion should fail. So we press on run and we see--oh, the output is still true. The assertion has not failed and this shows that during the execution, tag has never been set to true. So the correct answer is the output is still true. Since we do not get an exception, we can now reject our hypothesis. So we know that tag is not being set. Tag is always false. So let's go back to our code.

Hypothesis II

Q. Here's the only place where codes are handled. Tag is always false. This condition, if we see a quote and tag, then only we should go into quote mode. It should actually never hold but maybe there's something wrong with this condition.

So let's come up with a new hypothesis. Our new hypothesis is the error is due to the quote condition evaluating to true.

Let's come up with an experiment to verify this hypothesis. If this condition evaluates to true, then the next line should be executed. So what we do is, we simply write an assert false in here, meaning that the assertion should now automatically fail, if this ever gets executed. So this piece of code should actually never be reached when executing the program. Our expectation this time is, with this input, this assertion should fail. Again, we make this quiz after we inserted assert false.

Does the program now raise an exception meaning that the quote condition holds, and therefore, something's wrong with that condition or is the output still foo, that is the input with the quote strip, and then the quote condition obviously does not hold because the assert false is never reached. Try this out for yourself.

S. Here comes the answer--as you probably found out for yourself, the program indeed raises an exception that is the quote condition holds, and this confirms our hypothesis. So at this point, let us recall what we have seen. The error is due to the quote condition evaluating to true. We've seen that and confirm the experiments. Before, we already had seen that the error is due to tag being set was rejected-- that is tag is not being set.

Our initial hypothesis and our initial observation double quotes our stripped from the input. So we know at this point double quotes are stripped, tag is not being set, and the quote condition evaluates to true.

Another Hint

Let's take a look again at our example. We see that the condition in here not only handles double quotes but also single quotes. Question is, is there a difference in how the condition handles double quotes and single quotes?

We still have this hypothesis characterizing the error. Let's see whether it generalizes to arbitrary quotes, that is general quotes are stripped from the input, whether they'd be double or single. In order to check this hypothesis, we now use an input foo in single quote and we expect the output to have single quotes as well in the output as what's happening in here. Here again we have our program. We now need to remove these assertions in order to restore it to its original state. And now, we're going to invoke this with foo and single quotes and again, we need to enclose this with quotes to make this a Python string.

General rule of thumb, if you have single quotes in a Python string, we need to enclose it in double quotes. If you have double quotes in a Python string, we need to enclose it in single quotes. And if you have both, we need to come up with an additional multiple strings enclosed in single quotes and double quotes.

So, here we go, we put in foo and single quotes into the program and now let's see what happens. Surprise--what you see here, if you put in single quote, they are not stripped whereas just to recheck, double quotes; however, are stripped from the input. That's and important end. Double quotes are stripped and single quotes are not. So the output actually contains single quote which means that our hypothesis as up here as up here is rejected--the single quotes are not stripped.

Fixing the Bug

Q. So what we have at this point is that the condition in our code becomes true when we see a double quote and it becomes false when we see a single quote. At this point, we should have enough material to solve the problem.

What's the correct way to fix this condition-- should it be a check for the just double quotes and single quotes and tags, should we go and invert the condition such that not tag is being checked, should we put parenthesis around the or condition, or should it be none of the above and something completely different.

S. This is the correct way to write this condition. In Python and most other programming languages, or has a little precedence than and. And if you write this without parenthesis, the effect of this is that, this will be implicitly prioritize as this, meaning that the condition that tag must be said applies only for single quotes but not for double quotes which would always be stripped. And this is the reason why double quotes are stripped whereas single quotes are not stripped. Well, they are stripped but only when we are in tag mode and when we are in tag mode, then the characters wont appear anyway.

We can apply this very same fix in our program simply by putting parenthesis around this junction in here such as this junction takes precedence over the conjunction. Let's see whether our example with the double quotes now works. Yes it does. Now, we actually get the quotes properly in here. Let's try it a few more.

So let's see whether the single quotes still work too and while we read it we can even add a few more to that--let's see what the output will be.

First example, with double quotes are still there, not stripped. Second example, single quotes, not stripped. Third example, just the HTML markup, HTML markup is stripped. Next one, quotes around HTML markup, quotes remained, HTML markup stripped.

So HTML markup is removed in all cases. How about our almost complex example, we have the reference with quotes within the tag. Are they still properly stripped? We try this out. Yes, everything is properly stripped. Quotes and greater than signs are handled just as they should be.

What Did We Do

What is it we have seen? We have started with an initial hypothesis.

From this hypothesis we have made predictions. We have made experiments. We checked with the experiments, validated the prediction. If they did, we refined our hypothesis. If they didn't, we came up with alternate hypothesis.

We have repeated the process until we came up with a diagnosis. That is, a theory that is consistent with our earlier observations, and that also predicts future observations. In our case, the correct behavior.

Based on this diagnosis, how the failure came to be, we have fixed a code accordingly, and we have seen in our additional experiments that the theory holds. That is, the diagnosis was valid from the beginning on, and therefore, was a correct explanation of how the failure came to be.

Note that there are many ways to come up with an initial hypothesis. There are also many ways to refine a hypothesis and reject a hypothesis. Note that there may be multiple hypotheses to start with, and also of course during the scientific method, you may come up with different hypotheses.

If you have two competing hypotheses for the same failure, what you do is set up an experiment that decides which of the hypotheses is supported, and which of the hypotheses is rejected.

At the end, there should be only one hypothesis, there should be only one theory, and there should be only one diagnosis on how the failure came to be.

Alternate Hypothesis

Here is an example of an alternate starting point.

We could have started with the observation that, if I have a strain "foo" contained within tags, contained within quotes, then the quotes are stripped but the tags are still there. And from that we could have devised the hypothesis that tags in double quotes are not stripped.

If we will take a look into our code, now again in the buggy version, in order for the HTML markup to appear in the output, tag must be false. In order to have tag remain false, quote must be set. And quote can only be set in this line, and then this would also have left us with this precise condition to look at.

So although we start with a different hypothesis, we end in the same diagnosis, and we also end up with the same fix. And here we go. So from our hypothesis, we can deduce the tag must be false, because otherwise the markup wouldn't be in the output. Therefore, quote must be true, and therefore we have again our erroneous condition, which sets quote.

Alternate starting hypothesis, same diagnosis, and same fix.

After the Fix

After we have fixed the program we need to check two things.

First, we need to check whether the same error has been made elsewhere. In our case, if there is a programmer who has confused the precedence of ors and ands, it may be a good thing to check the code for other places, and to make sure that all of these have proper parentheses around them, which anyway is a good programming practice.

Second, we want to make sure that the error does not occur again. and we can do so either by writing appropriate tests,. Or, we can include appropriate assertion in the code that makes sure the error will not occur again.

Which Assertion

Q. In our case, there is a single assertion, which could be placed in the loop body to catch these kinds of errors, and actually an assertion which could remain in the code. It is an assertion that captures the relationship between the variables tag.

What is the correct assertion? Is it assert quote and not tag? Or quote or not tag? Or tag or not quote? Or is it assert tag and not quote?

S. This quiz can easily be answered if you recall our initial state diagram.

Initially, we were in the state no quote, no tag. If we see a beginning of a tag, we go into the no quote and tag mode. And in this mode, we can go into quote and tag mode, from which we exit again by seeing a closely quote, and when we see the closing tag, we go back into the non-quote, non-tag mode.

These are the three states that a program can be in. Which state is missing? The state that is missing is the state in which we would have quotes, but we are outside of tags.

This is exactly the problem we would be handling in our program. We would take care of quotes, even outside of tags. This state should not be reached. So how can we express this with our assert statement?

What we want to make sure of is that this state can never be reached. So this is quote and not tag. The inversion of this is tag or not quote. You can see that either we are in non-quote mode, or if we are in quote mode then we are in tag mode, and the assertion that actually checks this is this one. This is, therefore, the correct answer.

Just Fix It

At this point, we have systematically fixed the first bug using the scientific method to systematically come up with a hypothesis and refine it, refine it again, possibly come up with alternatives until we end up with a diagnosis.

So if you're an experienced programmer--and I assume you are-- you may now wonder, "Why on Earth should I actually go through all these explicit steps if I can just jump into my debugger and fix the problem right away?" It's actually likely that you have spotted the problem at the very moment you saw the code. If you're a very experienced programmer, you can probably spot errors like these in hundreds and hundreds of lines from a distance of several hundred meters. You just look at the code and immediately see, "Well, there's something fishy in there."

And all of this is, of course, right. There are many problems which you can immediately see and which you can immediately fix within 5 minutes including the one we just discussed And it's perfectly okay, if it only takes 5 minutes, to jump right into your editor and to fix things, to jump into your interactive debugger and fix things, as long as it doesn't take more than 5 minutes-- because what happens if it takes more than that?

Explicit Debugging

If you recall our story about you debugging late in the night, with your significant other calling you, and you not knowing what to do, let me now come up with a way to avoid all of this.

The answer here is Explicit Debugging. What does Explicit Debugging mean? It's simple. When you're debugging late in the night, it's usually because you try to keep everything in your head. You're making the hypothesis in your head, you're running the experiment, and you're keeping the results of the experiment in your head. This is okay for 5 minutes, but the longer this goes on, the harder it becomes to actually memorize all of this. And this is why everything gets so intense. You're so concentrated on trying to figure out what is going on, and this is why nobody can disturb you--because then you'd get out of your trance.

And the alternative to this implicit debugging where you're keeping everything in your head, of course, is explicit debugging. Explicit debugging at first simply means to write down what you're doing, make notes of what you see, make notes of what you expect, and make notes of what your current hypothesis is.

A common format for this, for instance, is to write down what the input was or generally what the experiment is, what you expected to see, and what you got instead. You may even want to write down the current hypothesis you're working on and whether this hypothesis is confirmed or rejected.

This way you keep a log of your actions, and this log gives you multiple advantages. To start, you can always revise what you actually did and what the result was, so you don't have to memorize it, and you don't have to repeat it. Secondly, you can resume the session at any time because everything is already written down. You don't have to store this in your head. When your significant other calls you and asks you out for a nice dinner-- well, everything is written down, and you can resume the next morning with a fresh mind and a nice dinner on top.

The third advantage, however, is that when you write things down and see them again-- this often already bears the solution at hand because forcing you to become explicit frequently makes it clear to you what the problem actually is, so it structures your thinking, and it helps you organize your thinking towards successful debugging.

Talk to Somebody

Sometimes it helps a lot simply to tell a colleague or a friend what you're up to. This is the problem I'm working on, this is what I'm seeing, this is what I tried out.

In my experience, in 2 out of 3 cases, the problem resolves itself simply by telling it. When I was a teaching assistant at the university, students would come to me and come out with all sorts of problems. Now I'm a professor; they only come with the hard problems. But when I was a teaching assistant, they would come along and say something like, "Oh, Andres, Andres, you know, I have this problem over here. You see, here's this loop, and in this loop we have this binary search tree, and you know, I've been adding 2 elements here, added 3 elements here, and more and more and more, and now I'm doing this and I'm doing that, I'm deleting over here, but. . ." And of course I didn't understand a single thing. But I was still there and nodding and saying "yes, yes, yes, sure," as you do, but then the student would come up and say, "Oh, actually, now that I'm saying it, I'm not deleting anything at all. I think, yes! It must be! If the element is not in the tree, Yes, the element is not in the tree, that's why it's not deleting! Thank you Andres! Thank you so very much, everything is just beautiful. Thank you so much for helping me out here." I was happy, the student was happy, and I hadn't done anything.

Of course, occasionally there were also problems where students would really come along and have real problems, and then ask me with a big puzzled face, and then I would have to come up again and say, "Okay, okay. Can you repeat this a bit slower? What's this thing about foo and a binary tree and the quotes and the tags and everything?" And then we would go into regular debugging, but in two-thirds of the cases, yup!

Simply by listening, I would be able to debug it. Hey! I was a king of debugging.

Helpful Teddy Bear

Here's a story that is reported by Kernighan and Pike in their book, "The Practice of Programming."

One university computer center kept a teddy bear near the help desk. Students with mysterious bugs were required to explain the problems to the bear before they could speak to a human counselor.

Picture that--speak to a bear, and in two thirds of all cases, the problem would resolve itself.

That's how effective explicit debugging is.

Remove the Blindfold

To make another case for explicit debugging, let me illustrate this with a game of Mastermind.

You know Mastermind, don't you? Your opponent has 4 pins hidden behind a board, and your job is to guess the colors of these pins. So you come up with 1 trial, and you get points for every color that's in the right position, and for every color that's correct but not in the right position. This is what your opponent tells you.

And so, you make 1 attempt after the other, until after normally 7 or 8 attempts or so, you finally guess the right combination. Mastermind is a prime example of applying the scientific method: coming up with 1 experiment after the other to validate or reject a hypothesis on what's behind these bars, and then based on your observations, you come up with a new hypothesis and a new hypothesis. You literally refine your hypothesis until you find the answer.

Now, when you're playing Mastermind, you always record what you tried before and what the result was. This is the same as explicit debugging. You write down what you do and what the result was. And this is what makes you effective. Not writing down what you do is like playing Mastermind with a blindfold, trying to remember all the combinations and all the results.

Sure, you can do that. You're a great thinker. You're a smart person. You could probably also play Mastermind blind. But why do that, if you can simply look at what you already did, interrupt the session at any time, and resume later at wish?

Think about your significant other and remove your blindfold.

What Should You Do

Q. Should you fix a bug as soon as you spot the problem? Always be sure to keep all details in your head because writing them down takes too much time? Explaining the problem--maybe to a teddy bear, to a colleague, or a friend--can be helpful. And finally, up to 50% to 70% of software development effort is spent on validation and debugging.

S. Fix a bug as soon as you spot the problem? Well, if you can fix a bug, yes. But normally you'd like to think about the problem, whether you can generalize it, find about the best place to fix, reason about it until you have a very good diagnosis, so this is definitely not true.

Always keep all details in your head. Well, the name of the last part was "Explicit Debugging," so you should not always try to keep all details in your head. This is wrong.

Explaining the problem can be helpful. Yes--plenty of anecdotal evidence for that.

And finally, up to 50% to 70% of software development effort is spent on validation and debugging. This is also true, and this is something we really, really, really must change.