Code Coverage Analysis
A major part of the engineering of a professional software project is creating a test suite for it. Without some sort of test suite, it is impossible to know if the software works at all. The D language has many features to aid in the creation of test suites, such as unit tests and contract programming. But there's the issue of how thoroughly the test suite tests the code. The profiler can give valuable information on which functions were called, and by whom. But to look inside a function, and determine which statements were executed and which were not, requires a code coverage analyzer.
A code coverage analyzer will help in these ways:
- Expose code that is not exercised by the test suite. Add test cases that will exercise it.
- Identify code that is unreachable. Unreachable code is often the leftover result of program design changes. Unreachable code should be removed, as it can be very confusing to the maintenance programmer.
- It can be used to track down why a particular section of code exists, as the test case that causes it to execute will illuminate why.
- Since execution counts are given for each line, it is possible to use the coverage analysis to reorder the basic blocks in a function to minimize jmps in the most used path, thus optimizing it.
Experience with code coverage analyzers show that they dramatically reduce the number of bugs in shipping code. But it isn't a panacea, a code coverage analyzer won't help with:
- Identifying race conditions.
- Memory consumption problems.
- Pointer bugs.
- Verifying that the program got the correct result.
Code coverage analysers are available for many popular languages, but they are often third party products that integrate poorly with the compiler, and are often very expensive. A big problem with third party products is, in order to instrument the source code, they must include what is essentially a full blown compiler front end for the same language. Not only is this an expensive proposition, it often winds up out of step with the various compiler vendors as their implementations change and as they evolve various extensions. (gcov, the Gnu coverage analyzer, is an exception as it is both free and is integrated into gcc.)
The D code coverage analyser is built in as part of the D compiler. Therefore, it is always in perfect synchronization with the language implementation. It's implemented by establishing a counter for each line in each module compiled with the -cov switch. Code is inserted at the beginning of each statement to increment the corresponding counter. When the program finishes, the runtime collects all the counters, merges it with the source files, and writes the reports out to listing (.lst) files.
For example, consider the Sieve program:
/* Eratosthenes Sieve prime number calculation. */ import std.stdio; bool flags[8191]; int main() { int i, prime, k, count, iter; writeln("10 iterations"); for (iter = 1; iter <= 10; iter++) { count = 0; flags[] = true; for (i = 0; i < flags.length; i++) { if (flags[i]) { prime = i + i + 3; k = i + prime; while (k < flags.length) { flags[k] = false; k += prime; } count += 1; } } } writefln("%d primes", count); return 0; }
Compile and run it with:
dmd sieve -cov sieve
The output file will be created called sieve.lst, the contents of which are:
|/* Eratosthenes Sieve prime number calculation. */ | |import std.stdio; | |bool flags[8191]; | |int main() |{ 5| int i, prime, k, count, iter; | 1| writeln("10 iterations"); 22| for (iter = 1; iter <= 10; iter++) | { 10| count = 0; 10| flags[] = true; 163840| for (i = 0; i < flags.length; i++) | { 81910| if (flags[i]) | { 18990| prime = i + i + 3; 18990| k = i + prime; 168980| while (k < flags.length) | { 149990| flags[k] = false; 149990| k += prime; | } 18990| count += 1; | } | } | } 1| writefln("%d primes", count); 1| return 0; |} sieve.d is 100% covered
The numbers to the left of the | are the execution counts for that line. Lines that have no executable code are left blank. Lines that have executable code, but were not executed, have a "0000000" as the execution count. At the end of the .lst file, the percent coverage is given.
There are 3 lines with an exection count of 1, these were each executed once. The declaration line for i, prime, etc., has 5 because there are 5 declarations, and the initialization of each declaration counts as one statement.
The first for loop shows 22. This is the sum of the 3 parts of the for header. If the for header is broken up into 3 lines, the data is similarly divided:
1| for (iter = 1; 11| iter <= 10; 10| iter++)
which adds up to 22.
e1&&e2 and e1||e2 expressions conditionally execute the right-hand operand e2. Therefore, the right-hand operand is treated as a separate statement with its own counter:
|void foo(int a, int b) |{ 5| bar(a); 8| if (a && b) 1| bar(b); |}
By putting the right-hand operand on a separate line, this illuminates things:
|void foo(int a, int b) |{ 5| bar(a); 5| if (a && 3| b) 1| bar(b); |}
Similarly, for the e?e1:e2 expressions, e1 and e2 are treated as separate statements.
Controlling the Coverage Analyser
When the -cov switch is thrown, the version identifier D_Coverage is defined.