[llvm-dev] Introducing the binary-level coverage analysis tool bcov

Fangrui Song via llvm-dev llvm-dev at lists.llvm.org
Sat Jun 27 20:59:19 PDT 2020


On 2020-06-26, Ammar Ben Khadra via llvm-dev wrote:
>## TL;DR
>
>We introduce bcov, an open-source binary-level coverage analysis tool [1].
>The details are discussed in our paper [2], which is accepted to
>ESEC/FSE'20. bcov statically instruments x86-64 ELF binaries without
>compiler support. It features several techniques that allow it to achieve
>high performance, transparency, and flexibility.
>
>For example, running "make check-llvm-codegen" with an instrumented version
>of llc introduces a mean performance overhead of about 32%. The overhead
>for smaller binaries can be as low as 2%. bcov accurately reports code
>coverage with a mean F-score of 99,86%. All of that without introducing any
>test regressions.
>
>
>## Design overview
>
>Given an ELF module as input, bcov will first analyze it to choose
>appropriate probe locations and estimate the size of the program segments
>required for patching. This analysis depends on the instrumentation policy
>chosen by the user. Then, bcov patches the input module by adding two
>program segments; one segment contains trampoline code and the other keeps
>coverage data.
>
>bcov uses trampolines to instrument the binary where it inserts probes in
>targeted locations to track basic block coverage. Each probe consists of a
>detour that diverts control flow to a designated trampoline. The trampoline
>updates coverage data using a single pc-relative mov instruction (e.g., mov
>BYTE PTR [rip+0xadd88],1), executes relocated instructions, and then
>restores control flow to its original state.
>
>To dump coverage data, the user can inject our small runtime library,
>libbcov-rt, using the LD_PRELOAD mechanism. This library scans the memory
>mappings of the current process to dump every data segment that starts with
>a specific magic number. This means that coverage data will be dumped
>separately for each ELF module. Dumping data takes place at process
>shutdown or upon receiving a user signal. The latter feature enables online
>coverage tracking.
>
>
>## Key features
>
>bcov integrates several techniques including:
>
> - Probe pruning. We instrument only what is necessary for tracking code
>coverage. To this end, we bring Agrawal's probe pruning technique [3] to
>binary-level instrumentation.
>
> - Precise CFG analysis. Imprecision in the recovered CFG can have several
>effects, which include causing the instrumented binary to crash. To improve
>CFG precision, we propose a novel jump table analysis and implement a
>variant of an available non-return analysis [4].
>
> - Static instrumentation. We implement several techniques like (1)
>optimized probe selection, (2) jump table instrumentation, (3) and
>systematic detour hosting.
>
>The result is that our approach can work transparently, and with low
>overhead, on large and well-tested C and C++ programs.
>
>
>## Relation to the compiler
>
>Code coverage analysis at the binary level, as implemented in bcov, can
>offer several benefits: first, it is highly flexible. A user can analyze
>the coverage of a particular module only. Even better, coverage tracking
>can be limited to specific functions, e.g., the ones affected by recent
>changes. Second, it is largely independent of the used compiler and the
>build system. We observed consistent results after experimenting with four
>different versions of gcc and clang and three different build types,
>namely, debug, release, and LTO. Finally, the primitives we use to divert
>control-flow and update coverage are simple. One can imagine (and hope)
>that extending bcov to system code, or other native languages, will not
>require a lot of work.
>
>However, the viability of static binary-level coverage analysis ultimately
>depends on the compiler toolchain that produces the binary. Specifically,
>we can think of two areas where compilers might help our tool by adding
>extra support, or at least by not emitting code that is difficult to
>analyze:
>
> - Source-level coverage. The ability to know whether a particular basic
>block was covered in the binary is quite interesting. However, developers
>would ultimately want to map this coverage information to source-level
>artifacts. Statement coverage can be supported already with the help of
>dwarf debugging information. But bcov can report the coverage of individual
>branch decisions at the binary level. Therefore, leveraging this
>information to obtain source-level MC/DC seems to be a natural next step.
>Ideally, we want to infer MC/DC using what we have already, namely,
>binary-level coverage and dwarf information. However, adding support to a
>custom mapping format similar to the one available in llvm-cov might be
>necessary. In this regard, we appreciate your suggestions on the best way
>to support MC/DC, if at all feasible. (feedback needed)
>
> - Function model. We assume that a function occupies a continuous code
>region. The function's main entry is expected to be at the beginning of
>this region. Also, we consider each landing pad to be a function entry as
>well. Based on this, we assume that the identified entries predominate all
>basic blocks inside the function. In other words, a function can not reach
>an arbitrary basic block inside another function without first visiting one
>of its entries. This assumption seems to hold pretty well in the binaries
>we experimented with, but we are not sure to what extent it generalizes to
>other compilers and ISAs. (feedback needed)
>
>
>## Potential future work
>
>Updating coverage using a single pc-relative mov instruction is transparent
>in the sense that it does not clobber any general-purpose register. It even
>preserves the CPU's eflags. This, in turn, has greatly simplified the
>implementation of our prototype. That is, we did not have to think about
>saving/restoring the CPU state. However, putting mechanisms in place to
>allow eflags to be *safely* clobbered, e.g., liveness analysis, can open
>the door for several design alternatives where a pc-relative mov (e.g., mov
>BYTE PTR [rip+0xadd88],1) can be replaced with:
>
>  -  or BYTE PTR [rip+0xadd88],c: where c is one-hot constant. This will
>reduce the size of required coverage data to 1/8 of the size currently
>required by bcov.
>
>  - add BYTE PTR [rip+0xadd88],1: 8-bit counter per superblock (or selected
>set of basic blocks). This might be useful for fuzzing.
>
>  - add DWORD PTR [rip+0xadd88],1: 32-bit counter per superblock (or
>selected set of basic blocks). This can be useful for profiling.
>
>
>So there is significant potential here. However, we have not explored these
>ideas any further, and we would highly appreciate any feedback about their
>viability and potential added value in comparison to what is already
>available in llvm-cov and sancov. (feedback needed)
>
>To conclude, we introduced bcov [1] and discussed a few areas where the
>LLVM community can help us assess its potential, identify related work, and
>plan its future direction. Many thanks in advance for your feedback.
>
>
>Kind regards,
>
>Ammar
>
>
> [1]: https://github.com/abenkhadra/bcov
> [2]: https://arxiv.org/pdf/2004.14191.pdf
> [3]: https://dl.acm.org/doi/10.1145/174675.175935
> [4]: https://dl.acm.org/doi/10.1145/2931037.2931047

Hi Ammar,

Great work!

I think the following statement in the paper is not true: "gcov relies on debug
information which is less accurate in optimized builds." GCC's gcov
implementation does not rely on debugging information. This can be verified with
-fdump-debug: `gcc --coverage -fdump-debug -c a.c` dumps an empty a.c.*.debug
file.

clang supports gcov instrumentation as well. clang --coverage does leverage
debugging information, but the usage is very minimum - debugging information is
just used to retrieve the start line of a function. I have studied and
contributed a bit to clang's gcov implementation recently and thus I am fairly
confident about this.

> In comparison, llvm-cov features a custom mapping format embedded in
> LLVM’s intermediate representation (IR).

I don't know whether a clarification is needed here. llvm-cov is a frontend tool
which supports two formats: (1) coverage mapping (clang -fprofile-instr-generate
-fcoverage-mapping) and (2) gcov. Using -fcoverage-mapping might be clearer in
the context.

> Also, sancov is tightly coupled with LLVM sanitizers (e.g., ASan) which add
> varying overhead. Extending bcov with additional feedback signals, similar to
> sancov, is an interesting future work

SanitizerCoverage is a standalone instrumentation pass
(llvm/lib/Transforms/Instrumentation/SanitizerCoverage.cpp)
which is not coupled with asan. -fsanitize-coverage= can be used standalone, or 
together with asan, lsan, msan, ubsan, etc.

Its overhead can be very small, especially if you use the recent inline-bool-flag
https://reviews.llvm.org/D77244
(`clang -fsanitize-coverage=inline-bool-flag a.c`) or inline-8bit-counters.
You can choose 3 modes: edge (default), bb, and func.
The flags are stored in a section __sancov_bools.
There is no standalone dumper though...


A few random comments below.


If I understand correctly, by leveraging superblock dominator graph (from
"Dominators, Super Blocks, and Program Coverage") and the any-node policy, the
coverage result is binary: not-covered vs covered. I agree that in many cases
this is sufficient. gcov and -fcoverage-mapping, on the other hand, can provide
line execution counts, which are handy in other cases. This fact should probably
mentioned somewhere, "RQ2: Instrumentation overhead" or another place.  Both
-fcoverage-mapping and gcov provide more information, so they have larger
overhead (I agree that gcov isn't a format optimized for file sizes - it uses
uint32_t records everywhere which can be quite expensive representing smaller
integers).


bcov currently rewrites program headers but not the section header table, so for
example, objdump -d a.any cannot disassemble the synthesized code.


bcov depends on capstone, which appears to be a pretty standard disassembly
tool... It is based on a ~2013 shimmed-down version of LLVM MC and
LLVM*Disassembler. The generated files (e.g. arch/X86/X86DisassemblerDecoder.c)
get ad-hoc sporadic amendment from contributors over the years.. It is a bit
unfortunate for LLVM that MC and LLVM*Disassembler (Machine Code, including
assembly/disassembly libraries) cannot be easily adapted by downstream
users..... (Well, downstream packages can use LLVM exported CMake files and use
find_program(LLVM) and link against some LLVM* libraries but this may pull in a
large set of dependencies)


More information about the llvm-dev mailing list