[LLVMdev] About Code coverage Algorithms.

Tue Oct 14 09:09:03 PDT 2014

Hi Umesh,

My apologies if I shouldn't have sent this answer to all on this list, 
but since this is the first time I ever replied on any GCC/LLVM thread, 
I'm taking a guess. I'm finishing a Ph.D. in static analysis and I have 
lectured compiler courses in the last few years as well as implemented 
some instrumenters and can share some insights on your problem. However, 
I would first suggest you deeply read both references you gave because 
the devil clearly lies in the fine prints of those papers.

In my understanding, both techniques rely on the very intuitive fact 
that only certain nodes in a CFG are necessary to deduce the complete 
trace of a program (i.e., you can compress the information because of 
the natural redundancy of the CFG if you restrict desired information. 
For instance, if you want to know the value of every variable at every 
point in a program, chances are you will need more than if you just want 
to know the concrete flow). On the one hand, you have a technique that 
uses a maximum spanning-tree and on the other you have one that uses 
dominators (possibly extended to use post-dominators). If you are 
familiar with dominators, then it should be clear that both techniques 
directly or indirectly infers the interesting nodes from the predicates 
nodes (dominator trees actually split levels at nodes that are 
predicates most of the times). You would expect both to give 
approximately the same performance, but your benchmark says otherwise. 
Is it a contradiction ?

Careful reading of the ISSTA paper reveals that coverage of this 
algorithm is likely limited to block coverage, and cannot deduce edge 
coverage (not 100% sure on this one) and path coverage (much more sure 
about this one). When they do compare their work to that of Ball 
(reference 15 in the paper), they do not seem to say they do better, but 
simply that they do faster on block coverage. Highly likely that this 
algorithm is limited to statement coverage and frequencies. I didn't do 
the maths nor checked what specifically is missing from the dominator 
tree that makes path reconstruction impossible, but intuitively if I do 
not preserve any ordering in the increment of the counters, I will not 
be able to do path reconstruction. You should note that your solution-a 
in the provided reference is actually extended in that afore-mentioned 
reference 15.

TL;DR: Both techniques do not share the same coverage analysis power, so 
they perform differently. Also, Ball paper was published 10 years 
earlier, so time precedence may also play a role in that preference.

Feel free to discuss it more if you want.

Cheers,

Thierry

P.S. "For each basic block to be instrumented we create a Boolean
variable which is initialized to false indicating that the block has
not yet executed. " Read section 4 of the ISSTA paper. You should 
convince yourself that if I only give you which nodes in a boolean 
fashion, path reconstruction is impossible, even in easy cases. The 
paper from Ball clearly solves the "tracing" problem, and not a subset 
of the "coverage" problem.

On 10/14/2014 12:09 AM, Umesh Kalappa wrote:
> Hi All,
>
> Good day for everyone .
>
> We benchmarked  the code coverage algorithms  like
>
> a)Optimal Edge Profiling
> (ftp://ftp.cs.wisc.edu/pub/techreports/1991/TR1031.pdf .) that are
> adopted by  GCC and LLVM
> b)Dominator Leaf
> instrumentation(http://users.sdsc.edu/~mtikir/publications/papers/issta02.pdf)
>   and i don't see the practical implementation of this.
>
> Goal is to reduce the instrumentation point  and when we benchmarked
> both algorithms .The number of blocks that are  required to instrument
> is less in option-b over option-a.
>
> Need some expert insights on this like why the option-b is not widely
> used over option-a.
>
>
> Thank you
> ~Umesh