CS258 Lesson 5. Testing in Practice


1. Introduction

Now, as we have finished with random testing, we'll talk about more advanced issues. Such as, what to do, if a fuzz tester overloads you with bugs. It sounds kinda silly, but it can really happen in practice. We are also going to talk about, how to take a large test case, that makes software fail, and turn it into a very small case. Finally we'll talk about, how to report bugs in such a way, that developers are more likely to pay attention to them.

2. Overwhelmed By Success

We can be overwhelmed by the success of our bug-finding effort. If you take a large software project that’s never been subjected to random testing, and hit it with a sophisticated random tester, this may very well happen.

This situation make look something like this: we run our tester over the weekend and come back to find 250 failed test inputs. Now not all of these failed inputs are going to correspond to a unique bug. Of the 250 failed test inputs, maybe 10 or 50 will be bugs.

There are two main ways to deal with bug inundation:

  1. Report A Bug
  2. Bug Triage

In the first solution, we simply pick a bug and report it. Then as soon as we get a new version of the system, we run the random test again, see which previously troubling test inputs are no longer problematic, and then report another bug. This turns out to be an effective strategy for smallish systems, where bugs can be fixed quickly.

On the other hand, if the people fixing bugs have a slow fix-cycle. Let’s say we get a new version every couple years. If this is the case, then we employ bug triage.

A bug triage is the process by which the severity of different bugs is determined, and we start to disambiguate between different bugs. This helps us get a handle on which bugs we can report in parallel. Any inputs that trigger separate bugs can be reported in parallel, but if we report all bug-triggering inputs that we found, we’re going to be causing a lot of duplicate bug reports

How do we start getting a handle on which bug-triggering inputs map to different bugs, and which ones map to the same bugs. There’s no silver bullet, but we do have a number of different tools to disambiguate bugs. 

  1. In the simplest case, the bugs in the system are causing assertion violation messages. One thing we can do is disambiguate based on assertion messages. You look assertion messages, and make the assumption that distinct assertion violation messages are caused by distinct bugs in the software under tests. This does not have to be true. We can one defect that maps to multiple outputs that look different (although this is unlikely). Another scenario is that we have multiple defects cause the same symptom. What we hope happens is that a single defect maps to a single symptom.

  2. Unfortunately, not all bugs resolve to nice assertion violation messages, and bug disambiguation can be trickier when all we have core dump ( a dump of the contents of main memory) or stack trace. These are going to give us some indication of what part of the code failed and the stack frames leading up to that failure.

  3. Our third weapon when doing bug triage is to search over the revision history of the S.U.T if we have access to its version control system. If it’s the case that a certain group of bugs appear just one revision bugs, and the other bugs are old. Then, it is likely that group of bugs is being triggered by code that was recently committed.

  4. Our final weapon is to examine the test case. Often it’s the case that test cases that trigger the same bugs have similar features. The problem is looking over large, randomly generated test cases is really painful. This leads us to test-case reduction or test-case minimization.

5. Test Case Reduction

7. Reporting Bugs

8. Example Bug Report

9. Building A Test Suite

10. Hard Testing Problems

11. Summary Of Testing Principles

  • Testers must want software to fail
  • Testers are detectives: they must be observant for suspicious behavior and anomalies in the S.U.T
  • All available test oracles should be used in testing
  • Test cases should contain values selected from the entire input domain
  • Interfaces that cross a trust boundary need to be tested with represent-able values not just those from the ostensible(obvious) input domain
  • A little brute force goes a long way
    • Sometimes, selected interfaces can be exhaustively tested
    • Almost everything else can be randomly tested
  • Quality cannot be tested into bad software (therac-25)
  • Testable software has:
    • no hidden coupling, side channels
    • few variables exposed to concurrent access
    • few globals shared between modules
    • no pointer soup
  • Code should be self checking, whenever possible using lots of assertions; however:
    • these assertions are not used for error - checking
    • assertions must never be side effecting
    • assertions should never be trivial or silly
  • When appropriate, all three kinds of input should be used as a basis for testing
    • APIs that are provided by the S.U.T. can be tested directly
    • APIs used by the S.U.T. can be tested using fault injection
    • non-functional inputs (multi-threaded)
  • Failed coverage items do not provide a mandate to cover the failed items, but rather give clues to ways in which the tests are inadequate

