[llvm-dev] Introducing the binary-level coverage analysis tool bcov

Fangrui Song via llvm-dev llvm-dev at lists.llvm.org
Tue Jul 7 17:17:27 PDT 2020


I noticed another tool in this area which was probably available before 1991.
MIPS pixie.

 From a document of IRIX, https://irix7.com/techpubs/007-2479-001.pdf
p90, I believe it works in a way very similar to bcov.

   Run pixie to generate the equivalent program containing
   basic-block-counting code.
   % pixie myprog
  ...
   pixie takes myprog and writes an equivalent program, myprog.pixie,
   containing additional code that counts the execution of each basic
   block

IRIX is a discontinued operating system.  I can't find more information
on this tool. Hope some MIPS folks on the list can provide more
information.

On 2020-06-28, Ammar Ben Khadra wrote:
>Hi Fangrui,
>
>Many thanks for providing such detailed feedback! Please find my comments
>inlined below.
>
>- Ammar
>
>
>On Sun, Jun 28, 2020 at 5:59 AM Fangrui Song <maskray at google.com> wrote:
>
>> On 2020-06-26, Ammar Ben Khadra via llvm-dev wrote:
>> >## TL;DR
>> >
>> >We introduce bcov, an open-source binary-level coverage analysis tool [1].
>> >The details are discussed in our paper [2], which is accepted to
>> >ESEC/FSE'20. bcov statically instruments x86-64 ELF binaries without
>> >compiler support. It features several techniques that allow it to achieve
>> >high performance, transparency, and flexibility.
>> >
>> >For example, running "make check-llvm-codegen" with an instrumented
>> version
>> >of llc introduces a mean performance overhead of about 32%. The overhead
>> >for smaller binaries can be as low as 2%. bcov accurately reports code
>> >coverage with a mean F-score of 99,86%. All of that without introducing
>> any
>> >test regressions.
>> >
>> >
>> >## Design overview
>> >
>> >Given an ELF module as input, bcov will first analyze it to choose
>> >appropriate probe locations and estimate the size of the program segments
>> >required for patching. This analysis depends on the instrumentation policy
>> >chosen by the user. Then, bcov patches the input module by adding two
>> >program segments; one segment contains trampoline code and the other keeps
>> >coverage data.
>> >
>> >bcov uses trampolines to instrument the binary where it inserts probes in
>> >targeted locations to track basic block coverage. Each probe consists of a
>> >detour that diverts control flow to a designated trampoline. The
>> trampoline
>> >updates coverage data using a single pc-relative mov instruction (e.g.,
>> mov
>> >BYTE PTR [rip+0xadd88],1), executes relocated instructions, and then
>> >restores control flow to its original state.
>> >
>> >To dump coverage data, the user can inject our small runtime library,
>> >libbcov-rt, using the LD_PRELOAD mechanism. This library scans the memory
>> >mappings of the current process to dump every data segment that starts
>> with
>> >a specific magic number. This means that coverage data will be dumped
>> >separately for each ELF module. Dumping data takes place at process
>> >shutdown or upon receiving a user signal. The latter feature enables
>> online
>> >coverage tracking.
>> >
>> >
>> >## Key features
>> >
>> >bcov integrates several techniques including:
>> >
>> > - Probe pruning. We instrument only what is necessary for tracking code
>> >coverage. To this end, we bring Agrawal's probe pruning technique [3] to
>> >binary-level instrumentation.
>> >
>> > - Precise CFG analysis. Imprecision in the recovered CFG can have several
>> >effects, which include causing the instrumented binary to crash. To
>> improve
>> >CFG precision, we propose a novel jump table analysis and implement a
>> >variant of an available non-return analysis [4].
>> >
>> > - Static instrumentation. We implement several techniques like (1)
>> >optimized probe selection, (2) jump table instrumentation, (3) and
>> >systematic detour hosting.
>> >
>> >The result is that our approach can work transparently, and with low
>> >overhead, on large and well-tested C and C++ programs.
>> >
>> >
>> >## Relation to the compiler
>> >
>> >Code coverage analysis at the binary level, as implemented in bcov, can
>> >offer several benefits: first, it is highly flexible. A user can analyze
>> >the coverage of a particular module only. Even better, coverage tracking
>> >can be limited to specific functions, e.g., the ones affected by recent
>> >changes. Second, it is largely independent of the used compiler and the
>> >build system. We observed consistent results after experimenting with four
>> >different versions of gcc and clang and three different build types,
>> >namely, debug, release, and LTO. Finally, the primitives we use to divert
>> >control-flow and update coverage are simple. One can imagine (and hope)
>> >that extending bcov to system code, or other native languages, will not
>> >require a lot of work.
>> >
>> >However, the viability of static binary-level coverage analysis ultimately
>> >depends on the compiler toolchain that produces the binary. Specifically,
>> >we can think of two areas where compilers might help our tool by adding
>> >extra support, or at least by not emitting code that is difficult to
>> >analyze:
>> >
>> > - Source-level coverage. The ability to know whether a particular basic
>> >block was covered in the binary is quite interesting. However, developers
>> >would ultimately want to map this coverage information to source-level
>> >artifacts. Statement coverage can be supported already with the help of
>> >dwarf debugging information. But bcov can report the coverage of
>> individual
>> >branch decisions at the binary level. Therefore, leveraging this
>> >information to obtain source-level MC/DC seems to be a natural next step.
>> >Ideally, we want to infer MC/DC using what we have already, namely,
>> >binary-level coverage and dwarf information. However, adding support to a
>> >custom mapping format similar to the one available in llvm-cov might be
>> >necessary. In this regard, we appreciate your suggestions on the best way
>> >to support MC/DC, if at all feasible. (feedback needed)
>> >
>> > - Function model. We assume that a function occupies a continuous code
>> >region. The function's main entry is expected to be at the beginning of
>> >this region. Also, we consider each landing pad to be a function entry as
>> >well. Based on this, we assume that the identified entries predominate all
>> >basic blocks inside the function. In other words, a function can not reach
>> >an arbitrary basic block inside another function without first visiting
>> one
>> >of its entries. This assumption seems to hold pretty well in the binaries
>> >we experimented with, but we are not sure to what extent it generalizes to
>> >other compilers and ISAs. (feedback needed)
>> >
>> >
>> >## Potential future work
>> >
>> >Updating coverage using a single pc-relative mov instruction is
>> transparent
>> >in the sense that it does not clobber any general-purpose register. It
>> even
>> >preserves the CPU's eflags. This, in turn, has greatly simplified the
>> >implementation of our prototype. That is, we did not have to think about
>> >saving/restoring the CPU state. However, putting mechanisms in place to
>> >allow eflags to be *safely* clobbered, e.g., liveness analysis, can open
>> >the door for several design alternatives where a pc-relative mov (e.g.,
>> mov
>> >BYTE PTR [rip+0xadd88],1) can be replaced with:
>> >
>> >  -  or BYTE PTR [rip+0xadd88],c: where c is one-hot constant. This will
>> >reduce the size of required coverage data to 1/8 of the size currently
>> >required by bcov.
>> >
>> >  - add BYTE PTR [rip+0xadd88],1: 8-bit counter per superblock (or
>> selected
>> >set of basic blocks). This might be useful for fuzzing.
>> >
>> >  - add DWORD PTR [rip+0xadd88],1: 32-bit counter per superblock (or
>> >selected set of basic blocks). This can be useful for profiling.
>> >
>> >
>> >So there is significant potential here. However, we have not explored
>> these
>> >ideas any further, and we would highly appreciate any feedback about their
>> >viability and potential added value in comparison to what is already
>> >available in llvm-cov and sancov. (feedback needed)
>> >
>> >To conclude, we introduced bcov [1] and discussed a few areas where the
>> >LLVM community can help us assess its potential, identify related work,
>> and
>> >plan its future direction. Many thanks in advance for your feedback.
>> >
>> >
>> >Kind regards,
>> >
>> >Ammar
>> >
>> >
>> > [1]: https://github.com/abenkhadra/bcov
>> > [2]: https://arxiv.org/pdf/2004.14191.pdf
>> > [3]: https://dl.acm.org/doi/10.1145/174675.175935
>> > [4]: https://dl.acm.org/doi/10.1145/2931037.2931047
>>
>> Hi Ammar,
>>
>> Great work!
>>
>> I think the following statement in the paper is not true: "gcov relies on
>> debug
>> information which is less accurate in optimized builds." GCC's gcov
>> implementation does not rely on debugging information. This can be
>> verified with
>> -fdump-debug: `gcc --coverage -fdump-debug -c a.c` dumps an empty
>> a.c.*.debug
>> file.
>>
>>
>After a second look at
>https://gcc.gnu.org/onlinedocs/gcc/Gcov-Intro.html#Gcov-Intro, it seems
>that I misunderstood why it is recommended to disable optimizations in
>gcov. You are right. This should be fixed. Thanks!
>
>
>> clang supports gcov instrumentation as well. clang --coverage does leverage
>> debugging information, but the usage is very minimum - debugging
>> information is
>> just used to retrieve the start line of a function. I have studied and
>> contributed a bit to clang's gcov implementation recently and thus I am
>> fairly
>> confident about this.
>>
>> > In comparison, llvm-cov features a custom mapping format embedded in
>> > LLVM’s intermediate representation (IR).
>>
>> I don't know whether a clarification is needed here. llvm-cov is a
>> frontend tool
>> which supports two formats: (1) coverage mapping (clang
>> -fprofile-instr-generate
>> -fcoverage-mapping) and (2) gcov. Using -fcoverage-mapping might be
>> clearer in
>> the context.
>>
>>
>Actually, we are referring to the first usage, which is, as I understand,
>the most popular. A clarification can indeed help here.
>
>
>> > Also, sancov is tightly coupled with LLVM sanitizers (e.g., ASan) which
>> add
>> > varying overhead. Extending bcov with additional feedback signals,
>> similar to
>> > sancov, is an interesting future work
>>
>> SanitizerCoverage is a standalone instrumentation pass
>> (llvm/lib/Transforms/Instrumentation/SanitizerCoverage.cpp)
>> which is not coupled with asan. -fsanitize-coverage= can be used
>> standalone, or
>> together with asan, lsan, msan, ubsan, etc.
>>
>> Its overhead can be very small, especially if you use the recent
>> inline-bool-flag
>> https://reviews.llvm.org/D77244
>> (`clang -fsanitize-coverage=inline-bool-flag a.c`) or inline-8bit-counters.
>> You can choose 3 modes: edge (default), bb, and func.
>> The flags are stored in a section __sancov_bools.
>> There is no standalone dumper though...
>>
>>
>>
>I remember that the term "tightly coupled" was used by a sancov developer
>in a Github issue. I can not find that thread now. This perhaps refers to
>the fact that (please correct me) casual users are not expected to use
>sancov as an independent pass. Instead, they should rely on the runtime
>already provided by one of the sanitizers. I shall revisit this issue to
>improve the wording. However, I think that the main point still holds;
>given this dependence on the sanitizer runtime and the variety of options
>offered by sancov, one can not refer to a 'reference' sancov to report
>its instrumentation overhead. Nevertheless, the functionality offered by
>the sancov option 'trace-pc-guard' is perhaps closest to that in bcov. If
>we use this as the basis for measurement, then the performance overhead
>should be small. Can you please refer me to instructions on how I can use
>sancov without any sanitizer runtime?

Sadly it can't. trace-pc-guard is the only feature which can currently
dump information to a .sancov file, which can then be inspected by
the 'sancov' tool.

The runtime must be provided by a sanitizer,
-fsanitize={address,memory,thread,fuzzer,undefined,cfi}. There must be a
runtime. So saying SanitizerCoverage is coupled with sanitizers is probably fine.

>A few random comments below.
>>
>>
>> If I understand correctly, by leveraging superblock dominator graph (from
>> "Dominators, Super Blocks, and Program Coverage") and the any-node policy,
>> the
>> coverage result is binary: not-covered vs covered. I agree that in many
>> cases
>> this is sufficient. gcov and -fcoverage-mapping, on the other hand, can
>> provide
>> line execution counts, which are handy in other cases. This fact should
>> probably
>> mentioned somewhere, "RQ2: Instrumentation overhead" or another place.
>> Both
>> -fcoverage-mapping and gcov provide more information, so they have larger
>> overhead (I agree that gcov isn't a format optimized for file sizes - it
>> uses
>> uint32_t records everywhere which can be quite expensive representing
>> smaller
>> integers).
>>
>>
>>
>Yes, the reported coverage is binary. It indicates whether a basic block is
>covered or not based on the coverage of its superblock (a set of basic
>blocks with the same coverage information). Profiling represents an
>interesting use case, indeed. I noted in the 'Potential future work'
>section that updating coverage data using:
>
>add DWORD PTR [rip+0xadd88],1
>
>instead of the current:
>
>mov BYTE PTR [rip+0xadd88],1
>
>Enables profiling using a 32-bit counter per superblock. However, a
>superblock can span multiple lines in the source code which renders this
>counter difficult to relate to. Instead of grouping basic blocks based on
>superblocks, we need to think about mapping each source line to its
>corresponding basic blocks in the binary and then choose the best BB to
>instrument with such a counter. Expectedly, this will incur more overhead.
>
>Also, note that "RQ2: Instrumentation overhead" is concerned with
>evaluating the overhead of bcov only. We did not attempt to directly
>compare this overhead to that of llvm-cov or gcov since, as you noted
>already, both tools track different artifacts, i.e., not directly
>comparable. However, we mentioned that llvm-cov can dump "320GB of coverage
>(and profiling) data" if online merging was not enabled. So there was an
>explicit emphasis on the role of profiling here. And that discussion was in
>the context of showing that the same online merging feature can be
>leveraged by bcov as well.
>
>
>
>> bcov currently rewrites program headers but not the section header table,
>> so for
>> example, objdump -d a.any cannot disassemble the synthesized code.
>>
>>
>>
>
>The original section header table is preserved. But adding two additional
>section headers to describe the patch segments is a nice to have feature. I
>developed a simple tool to disassemble the patch code which was sufficient
>for my debugging needs. I'm planning to open source this tool later. In the
>meantime, using gdb as described in the artifacts repository (
>https://github.com/abenkhadra/bcov-artifacts/blob/master/sample-binaries/README.md)
>might be the easiest way to inspect the patch code.
>
>
>
>> bcov depends on capstone, which appears to be a pretty standard disassembly
>> tool... It is based on a ~2013 shimmed-down version of LLVM MC and
>> LLVM*Disassembler. The generated files (e.g.
>> arch/X86/X86DisassemblerDecoder.c)
>> get ad-hoc sporadic amendment from contributors over the years.. It is a
>> bit
>> unfortunate for LLVM that MC and LLVM*Disassembler (Machine Code, including
>> assembly/disassembly libraries) cannot be easily adapted by downstream
>> users..... (Well, downstream packages can use LLVM exported CMake files
>> and use
>> find_program(LLVM) and link against some LLVM* libraries but this may pull
>> in a
>> large set of dependencies)
>>


More information about the llvm-dev mailing list