Now, as we have finished with random testing, we'll talk about more advanced issues. Such as, what to do, if a fuzz tester overloads you with bugs. It sounds kinda silly, but it can really happen in practice. We are also going to talk about, how to take a large test case, that makes software fail, and turn it into a very small case. Finally we'll talk about, how to report bugs in such a way, that developers are more likely to pay attention to them.
We can be overwhelmed by the success of our bug-finding effort. If you take a large software project that’s never been subjected to random testing, and hit it with a sophisticated random tester, this may very well happen.
This situation make look something like this: we run our tester over the weekend and come back to find 250 failed test inputs. Now not all of these failed inputs are going to correspond to a unique bug. Of the 250 failed test inputs, maybe 10 or 50 will be bugs.
There are two main ways to deal with bug inundation:
In the first solution, we simply pick a bug and report it. Then as soon as we get a new version of the system, we run the random test again, see which previously troubling test inputs are no longer problematic, and then report another bug. This turns out to be an effective strategy for smallish systems, where bugs can be fixed quickly.
On the other hand, if the people fixing bugs have a slow fix-cycle. Let’s say we get a new version every couple years. If this is the case, then we employ bug triage.
A bug triage is the process by which the severity of different bugs is determined, and we start to disambiguate between different bugs. This helps us get a handle on which bugs we can report in parallel. Any inputs that trigger separate bugs can be reported in parallel, but if we report all bug-triggering inputs that we found, we’re going to be causing a lot of duplicate bug reports
How do we start getting a handle on which bug-triggering inputs map to different bugs, and which ones map to the same bugs. There’s no silver bullet, but we do have a number of different tools to disambiguate bugs.
In the simplest case, the bugs in the system are causing assertion violation messages. One thing we can do is disambiguate based on assertion messages. You look assertion messages, and make the assumption that distinct assertion violation messages are caused by distinct bugs in the software under tests. This does not have to be true. We can one defect that maps to multiple outputs that look different (although this is unlikely). Another scenario is that we have multiple defects cause the same symptom. What we hope happens is that a single defect maps to a single symptom.
Unfortunately, not all bugs resolve to nice assertion violation messages, and bug disambiguation can be trickier when all we have core dump ( a dump of the contents of main memory) or stack trace. These are going to give us some indication of what part of the code failed and the stack frames leading up to that failure.
Our third weapon when doing bug triage is to search over the revision history of the S.U.T if we have access to its version control system. If it’s the case that a certain group of bugs appear just one revision bugs, and the other bugs are old. Then, it is likely that group of bugs is being triggered by code that was recently committed.
Our final weapon is to examine the test case. Often it’s the case that test cases that trigger the same bugs have similar features. The problem is looking over large, randomly generated test cases is really painful. This leads us to test-case reduction or test-case minimization.
So does construction is the process of taking some large implode the triggers a failure and turning it into a small and and it's usually the case that many bugs can be triggered by small insights but on the other hand that's often the case of for example we discover firefox crashed webpage the cause of the firefox crashed might be giants probably huge risk of a spot in the wild of sicily speaking the pattern in the center of the cheers the firefox crashed probably is a good we found in some small webpage so we can do is by hand figure out what part of the in because the test case so one thing we can do is eliminate brilliant but and sometimes you do this in a smart way for example we know whom i just know that some part of it is unlikely trigger crash or it might just chop saw not blindly and see if the smaller and the triggers the testcase if it doesn't then we go back to our regional test case and try again for the does sonam areas this for really lucky at the very end of this process weather for the really small testcase and but that's the thing we'd like to report to the people developing a softer and attached of course even a four not reporting bugs to someone else for this to say we're fine bugs and suffer the rewrote the still really nice teva minimize test case because he's made a much easier to track down fair so option one minority option we've been done by people dividing for probably about as long as computer science has been around the second option is really cool to really really nice technique who's called alter the bugging a lot of business process so if you can write a program but can't tell automatically if a particular input triggers a failure that is to say you load up the webpage in firefox and see if it crashes by looking at the exit code that it provides to the operation storm then both of the bugging is a framework the takes your script and takes the test imposed and automates this process in the loop and this loop terminates win the delta divider which have a bunch of teristics built in for a brief eliminating president but it terminates when the start of the mugger can reduce the anthony moor so i don't want to do is going to this technique in a ton of detail because bill geiger developed adult of logging is gonna be teaching you testify sometime soon as probably gonna be really interesting and well i hope is that have intriguing enough fissile really seriously consider taking his class is gonna be about the budget right now show you an example.
So i want to do is go back to but what terminal so what did you see see bugged remote-control down but i believe the slum area with a little bit so we said it kind of syracuse small dot c it's going to die with an assertion violation about bizarre and about and on them live into the entry block and that's a surgeon fail so we need to do first is check if allow them people are you know about this one sorbet go here to there bugzilla search for this exact strang they don't know about this books that size so we know about about they don't know about it we can report it and i have to go back in the gorgeous test-retest put it in here is instead of invoking the delta bugger which is nice extremely general purpose powerful tool move over different full called serious concern please by my group and when it is extremely special purpose dot at the borders or operate on exactly the same dot on the body ideas that the f_b_i_ report operates by and it just as extra knowledge embedded in a about how to really see programs to take a little while so you have to wait for it like i do that's what i'm trying to see how long it takes i thought about eleven minutes so incredibly quick not too shabby either member this was wasn't uh... time that i had to be attending to a computer the computers just to an automated search opera and security out of here though but there is a test case is pretty small sports jackets but count situated sonya four points that's nice so i'm gonna do here is mick about report and what i'm a first do it is picture another version of crying so its version one five six my seventy components uh... much of a test case kits include and i'm going to show clank rational at effects whatsoever for and so if i haven't appeared to go through all of the steps that i narrated to earlier because of report a lot of compiler bugs i know i think what i can get away with so uh... girl so this includes i believe enough information for the healthy young people to reproduce the bug should be good bookmark with reporting and strong and we just need a name for this book report ok bugzilla strive to help us ovoid the duplicate i don't think it's told us about anything that we didn't know effect on his own mark is fixed anyway so we're good and was shipped a song frigate now they love him developers who make it up in the morning have a uh... the barber portal and that concludes are dead-on but reporting.
That's all i can talk just really briefly bubble in a test me for a piece of software such a sweet is just a collection of past as often the case for the test we can be run automatically it's also often the case that this week it's run periodically so for example perhaps in nightly on every comment or make is that some visible since if commits a frequent and test cases slow below it s free it's a show that some software under test person desired properties namely passing all the tests although it's very common for real software to almost always be in a state of partial failure for hope is that most of the time most of these failures and optical and severe winds so the question is containing a software project wasn't asleep such a large extent of the matter of taste and preference but on the other hand it's a pretty common features of nearly all tests which is very common first of all talk a lot of unit s for these features specific test are small test that exercise very specialized behaviors so for example for developing some sort of a web browser who might have test infected different html not surrender correctly and that sort of thing also very common protest we to contain large realistic and thoughts so for example for testing some sort of a microprocessor they would look like summer for a couple of hours the purpose of these kind of inputs esterified realistic stresses on the system and exercise a lot of features in combination which of the things are upset nearly always a good idea to include a regression tests in its history regression test is basically any input it's caused any version softer under test fail at any time for several reasons the regression tests exist of the main one of which is the one i make sure that the software the test of the regrets that has to say it doesn't go back into a state in which a fails on about what we are a fixed their number of reasons why back it happened first of all regression socs will because whatever the defect was in the software because the bug in the first place we might not gotten rid of all the instances of that defect in the source code so for example about a piece of good light of a cut and pasted to several other places and those other locations might not be causing arc system to fail currently but some other change my enable the bodies of the fire and that might happen again another reason is pretty easy through for example basalts of the revision control system tax only go back over to a file before we fixed a bug if that happens who i catch as soon as possible because because some regression test right side of the reason is that defects and software confirm occurs in people's thinking it's pretty often the case of the person in there it is a defect in the software didn't actually correct the error that they had and i think rather maybe somebody else fixed the defect and the person retains the mistaken assumption about some sort of an a_t_r_ something and due to the fleeing the error in somebody's head a they can go ahead and start having similar defects to the system later on and if we have good regression tests we spend more of a chance of catching those kind of things something that usually doesn't go into a test suite as a random test and for whatever reason not to assure that i understand all the reasons even random testing often treated as a separate activity was related to the fact that reading tests often on deterministic mustering careful herbs are the same c they don't have a clear correctness criterion and perhaps more importantly read it s all summer possibility shana something new has to say they have the possibility of introducing a test case so we haven't seen before factors we hope will happen and remember the undesirable as the test results to be predictable disposed to consist of things that we know to test for now if all of a sudden the district it's it's a new and different tests then that's not necessarily good so-so for whatever combination of these reasons when testing is often a separate activity.
Sometimes we find urself faced with really hard software testing problems lead over some of the characteristics of these problems michaela specification is comin or perhaps only lack of a good specification there no copper role implementations but the kind of system right behind us to save the for system with sort alexis and quite hard cause it'll probably means is we're devoting a special occasion for even developing the specification as we go big systems or her to test large highly structured and its bases adjusting quite hard and so thought i'd imagine sort of a hard testing problem with uh... bartel a structured into space consider for example the flight control computers on a spacecraft or on an airplane these things take sort of an enormous right if input from all sorts of different redundant sensors the time at which these and that's the arrived a significant the space craft or the airplane has all sorts of physical copies like its altitude its attitude of the position of the various control surfaces lol affect the dynamics is gonna systems are really really sort of truly hurt attached not determined to make a system very hard to test an issue here is that will play a test case against the system wants anna succeeds but that at some later time some variables not under our control because the system to fail on that same input lots of kids they make systems are to test so to some sort of hurdles are extremely hard to test a for example java virtual machines but are run by for example financial organizations on lots of course with huge amounts of memory you think that's a so much internal state when something goes wrong it's almost impossible to make any inferences about what was going on inside of it and you need to try to reproduce the problem but of course is also extremely hard because the problem probably happened three hours into some sort of a massive prostitute at finally free lack strong oracle's testing can be really hard and so for example sort of like a large molecular simulation might be very hard to test sort of some sort of a new simulation code we've no idea what the right answer is supposed to be probably it's running on some sort of a large parallel machine to have a very were hard time reproducing problems from such a long time to occur to move out of how to be extremely hard to test because response of the thing is going to be inherently in terms of the stability and good behavior of the airplane and of course this is an incredibly large complex physical object this hurdle mollen simulate in a reliable fashion dipstick about making a strong test or offer our pilot the phones almost inconceivable everybody census of the giant gpm an influx of course for a long time using huge amount of cheap very very hard to test the behavior of something and i kind of a state for the personal ask ourselves is how we handle these situations hash we test these things and often there are really any easy answers well we can do is we can leverage week oracle's to a maximum stamp possible if any of these things the simulation help rather than see them she crashes in about fashion and we definitely know something's gone wrong we try to bootstrap some degree of confidence in the software under test putting small test inputs for which we can check the output and trying to argue that for example somehow dal tile it if it responds well for these and that's also responds well for other test and in the end if reported in our attempts to do really good testing mcauliffe to rely on mon testing methods because of course we should be doing could inspections in using formal methods or systems in any case if we care about the reliability what's happening here if you really can test the system effectively whom i have to rely on these things more than we would like because that's just a quick survey of things that we can make testing really hard in practice.
Alright, We've come nearly to the end of our course. What I'd like to do now is summarize what i think are the high points that is the most important testing principles that I've tried to convey in this course and put them all in one place. So let's go through this. First of all testers must want software to fail. Second, testers are like detectives who are hunting down bugs. As detectives, testers have to be observant to all sorts of suspicious behaviors and anomalies in the software under test. My guess is the number of really serious bugs that occurred are things that had already been notice by people but had been swept under the rug because the people are busy, the just wanted to ship the product or maybe they were users who didn't know what the bugs meant. Users do you have the luxury between or embarks the testers dot and so it's really important not to sweet things under the rug all available test oracle's should be used as a basis for testing lest you might be tempted to think from the language that i was using that is strong oracle stresses week one that if we had a couple of good strong oracle's available maybe we need them we've got a buddhist wouldn't use the mall and i hope i convinced you buy now was not the case law all of the oracle should be used for the call generally detect different kinds of faults and even if they detect the same faults week oracle's might much cheaper to use test cases should contain values selected from the entire input domain. and if there's doubt about what exactly the domain of something will be good to have trouble for developers your faces across the trust manager need to be tested was all represent-able values, not just those from the extensible and put them in supercross and examples we looked at if we are writing webserver when i hope that everybody summits data as well formatted but it's most likely the case but they won't and the reason they want it will be trying to break in for webserver so we need to test on a kind of data to ensure that we can correctly rejected similarly puppies a softer like the links colonel have the trust boundary out the system call their faced as to say at the interface between use remote applications analytics colonel unix kernel like the webserver can't trust those clients we're gonna make well-formed requests all the time it's expect those clients if not actually hostile are at least buddy and read all sorts of crazy stuff and it's a catch that sort of crashing or by letting security policy little brute force goes a long ways on the test women in particular isn't certain restricted circumstances into exhaustive testing and almost anything else can you randomly tested quality kabi tested and about software we saw the factory five example with a control software for the radiation therapy machine was probably so broken that almost and almost no amount of testing with insufficient to make a great it'd be thrown away and they needed to start over i'm sure we often softer look like a so in contrast with examples like a fair trial five possible software has a few of the following politics noted coupling between models inside channels or models can share information without being visible to the system developers few variables their share between brands few global variables shared between models and no pointer soup minister said no huge data structures of players going everywhere we can't possibly keep track of who's changing what and what's valid and what's not could should be self checking whenever possible using plenty of assertions user shins are never used for air checking rather the used to check for logically impossible conditions that implies some sort of an internal consistency violation solutions must never be side-effect because if they are you turn them off the system behavioral change misleads the madness among developers finally the sessions can every sillier trivialize because first of all the serve no purpose second the clatter of code third if they make things slower biota failed to create a useful information the next person who looks at the code when appropriate all three sources with the to appease a softer under test should be used as a basis for testing those included the obviously p_r_i_'s provided by the softer under test which can be tested directly it is used by the software under test commit tested using fault injection techniques sore call that these are things like substituting the library the provides the c_p_i_ is with a different library linz x faults or perhaps just happen belair underneath conduct faults finally non-functional inputs such as presidential zz they should be tested using whatever method you can get borked actually testing site and finally last principle for testing the failed code coverage do not provide a mandate to cover the filled items no matter how attending that might be but rather they give clues two ways in which yes we did not put sublime according to the coverage metric is gonna destroy those clues that's gonna do it in such a way that doesn't improve the quality of the tests we very much so taken together listserv items that are just giving you conference is pretty much all that i know about testing and the detailed version of the is has been the content of this course use material but i have never taught before so i hope it came out and sort of a fairly good here in fashion it stuff that's been brewing in my mind for a long time i wanted to teach it because have those for years but we don't seem to be doing a very good job teaching steer students attached what happens instead is they right small test cases in response to assignments they divide them telecast the test cases we give them and the man than ever look at them again as hard to think of anything less like the real world of softer development then the environment we cream profit for trying to do is structures course a little bit differently photographing the life you are really important but we often don't look very good job with who's been really enjoyable for me it's been great actually try to set this material down in the content passion every much hope that this material has been useful for u and at the classes but enjoyable thank you.