How do I understand the testing coverage achieved by my Hexawise-generated tests?

Click on the "Analysis" link to use this feature.

The Analysis coverage charts can be extremely useful and help answer questions like: "How much coverage is each of my tests adding?" and "How much testing is enough?"

It takes a few minutes to understand what the valuable information in the charts mean. The number of Parameters and Values you entered in the "Parameters" screen will determine how many total possible pairs of Values there are in your test model. A simple example with 8 Values makes this table:

Given these inputs, your model will have exactly 24 possible combinations of pairs of values, as shown below:

The first test case (Large / Heavy / Purple / Hexagon) will test for 6 of the 24 possible pairs.

So after the first test case, the coverage chart will show that 25% of the total possible pairs that could be tested in this simple example have actually been tested at this point. So far so good.

The second scenario (Small / Light / Purple / Circle) will test another 6 pairs. Importantly, none of these 6 pairs of Values have been tested yet. In our first two tests, we will have tested a total of 12 pairs of Values.

So after 2 test cases, the chart shows that 50% of the possible pairs (e.g., 12 tested out of 24 possible) have been tested.

Why do coverage charts start off with a steep trajectory (with lots of added coverage per test) only to flatten out towards the end (with only a little added coverage per test)? Analyzing test number 3 shows us why:

There is no possible way to select values so that we test for 6 new pairs of values as we did in each of the first two tests. The best we can do is test 5 new pairs of values and 1 previously tested pair. In this 3rd test, "Large and Hexagon" had already been tested in the first test.

After test 3, we have now tested for 17 pairs of of 24 total possible pairs. The coverage chart shows 70.8% (vs. 75% had we been able to squeeze in 6 new pairs into test 3).

What is up with the final two test cases? We were able to achieve 25% coverage of pairs in test 1 and test 2. Why do test 5 and test 6 only achieve a measly 4.2% increase each?

The final two scenarios each only add a tiny amount of coverage because we managed to test all but two pairs of Values in the first four test cases and because it will require at least two additional test cases to test those final two remaining pairs.

The only pair tested for the first time in scenario 5 is "Small and Hexagon". The only new pair tested in scenario 6 is "Large and Circle". That is only one-sixth as many new pairs in each test as compared to either of the first two tests.

The likelihood of finding a new defect in test number 5 or 6 is much lower than finding a new defect in test case 1 or 2.

A few important points to consider when analyzing coverage information:

1) First, when used correctly (and thoughtfully), the information can be extremely useful. It gives you a quick method for objectively assessing "how much extra testing coverage am I achieving with each new test?" and "how much testing is enough?"

Many testing teams have a rule-of-thumb. For example, to stop executing the Hexawise-generated tests after they have achieved 80% coverage because they can clearly see diminishing marginal returns to further testing after that point.

2) The second thing to keep in mind is cautionary. As George Box says, "All models are wrong. Some models are useful." It would be a mistake to look at the graph, see that "100% coverage" has been achieved after the final Hexawise-generated test, and conclude that the tests cover everything that should be tested.

An "Analysis" chart generated by Hexawise, like all software testing coverage reports, is an imperfect model of what should be covered (which is itself based on an imperfect model of the System Under Test). There could be significant aspects of the System Under Test that were not entered into the "Parameters" screen.

It is important to remember that one or more of those excluded aspects ( hardware configuration, software configuration or plug-in, the order in which actions are executed, whether a user is navigating with mouse or a keyboard, whether or not "submit" buttons are clicked multiple times quickly, etc.) could potentially cause a defect that might not be identified in your current set tests.