[LLVMdev] About Code coverage Algorithms.
thierry-3.lavoie at polymtl.ca
Tue Oct 14 09:09:03 PDT 2014
My apologies if I shouldn't have sent this answer to all on this list,
but since this is the first time I ever replied on any GCC/LLVM thread,
I'm taking a guess. I'm finishing a Ph.D. in static analysis and I have
lectured compiler courses in the last few years as well as implemented
some instrumenters and can share some insights on your problem. However,
I would first suggest you deeply read both references you gave because
the devil clearly lies in the fine prints of those papers.
In my understanding, both techniques rely on the very intuitive fact
that only certain nodes in a CFG are necessary to deduce the complete
trace of a program (i.e., you can compress the information because of
the natural redundancy of the CFG if you restrict desired information.
For instance, if you want to know the value of every variable at every
point in a program, chances are you will need more than if you just want
to know the concrete flow). On the one hand, you have a technique that
uses a maximum spanning-tree and on the other you have one that uses
dominators (possibly extended to use post-dominators). If you are
familiar with dominators, then it should be clear that both techniques
directly or indirectly infers the interesting nodes from the predicates
nodes (dominator trees actually split levels at nodes that are
predicates most of the times). You would expect both to give
approximately the same performance, but your benchmark says otherwise.
Is it a contradiction ?
Careful reading of the ISSTA paper reveals that coverage of this
algorithm is likely limited to block coverage, and cannot deduce edge
coverage (not 100% sure on this one) and path coverage (much more sure
about this one). When they do compare their work to that of Ball
(reference 15 in the paper), they do not seem to say they do better, but
simply that they do faster on block coverage. Highly likely that this
algorithm is limited to statement coverage and frequencies. I didn't do
the maths nor checked what specifically is missing from the dominator
tree that makes path reconstruction impossible, but intuitively if I do
not preserve any ordering in the increment of the counters, I will not
be able to do path reconstruction. You should note that your solution-a
in the provided reference is actually extended in that afore-mentioned
TL;DR: Both techniques do not share the same coverage analysis power, so
they perform differently. Also, Ball paper was published 10 years
earlier, so time precedence may also play a role in that preference.
Feel free to discuss it more if you want.
P.S. "For each basic block to be instrumented we create a Boolean
variable which is initialized to false indicating that the block has
not yet executed. " Read section 4 of the ISSTA paper. You should
convince yourself that if I only give you which nodes in a boolean
fashion, path reconstruction is impossible, even in easy cases. The
paper from Ball clearly solves the "tracing" problem, and not a subset
of the "coverage" problem.
On 10/14/2014 12:09 AM, Umesh Kalappa wrote:
> Hi All,
> Good day for everyone .
> We benchmarked the code coverage algorithms like
> a)Optimal Edge Profiling
> (ftp://ftp.cs.wisc.edu/pub/techreports/1991/TR1031.pdf .) that are
> adopted by GCC and LLVM
> b)Dominator Leaf
> and i don't see the practical implementation of this.
> Goal is to reduce the instrumentation point and when we benchmarked
> both algorithms .The number of blocks that are required to instrument
> is less in option-b over option-a.
> Need some expert insights on this like why the option-b is not widely
> used over option-a.
> Thank you
More information about the llvm-dev