[llvm-dev] [EXTERNAL] Introducing the binary-level coverage analysis tool bcov

Ammar Ben Khadra via llvm-dev llvm-dev at lists.llvm.org
Sat Jun 27 01:43:51 PDT 2020


Hi Alan, thanks for pointing that out! I was not aware of this ongoing work.

I will have a deeper look at it to see how it can be complemented with
binary-level coverage. Thanks again.

- Ammar


On Fri, Jun 26, 2020 at 10:41 PM Phipps, Alan <a-phipps at ti.com> wrote:

> Hi Ammar, bcov sounds really interesting.
>
>
>
> I just completed implementation for LLVM source-based branch condition
> coverage, which extends llvm-cov visualization and makes use of (mostly)
> existing counter instrumentation to track True/False execution counts for
> leaf-conditions comprising any source-level Boolean expression.  I plan to
> upstream this support soon.  MC/DC would also be the next logical step.
>
>
>
> Having a binary-level coverage tool is a great complement, and perhaps you
> can utilize the extensions I made llvm-cov for visualization.
>
>
>
> -Alan Phipps
>
>
>
> *From:* llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org] *On Behalf Of *Ammar
> Ben Khadra via llvm-dev
> *Sent:* Friday, June 26, 2020 4:51 AM
> *To:* llvm-dev list
> *Subject:* [EXTERNAL] [llvm-dev] Introducing the binary-level coverage
> analysis tool bcov
>
>
>
>
>
> ## TL;DR
>
> We introduce bcov, an open-source binary-level coverage analysis tool [1].
> The details are discussed in our paper [2], which is accepted to
> ESEC/FSE'20. bcov statically instruments x86-64 ELF binaries without
> compiler support. It features several techniques that allow it to achieve
> high performance, transparency, and flexibility.
>
>
>
> For example, running "make check-llvm-codegen" with an instrumented
> version of llc introduces a mean performance overhead of about 32%. The
> overhead for smaller binaries can be as low as 2%. bcov accurately reports
> code coverage with a mean F-score of 99,86%. All of that without
> introducing any test regressions.
>
>
>
> ## Design overview
>
> Given an ELF module as input, bcov will first analyze it to choose
> appropriate probe locations and estimate the size of the program segments
> required for patching. This analysis depends on the instrumentation policy
> chosen by the user. Then, bcov patches the input module by adding two
> program segments; one segment contains trampoline code and the other keeps
> coverage data.
>
> bcov uses trampolines to instrument the binary where it inserts probes in
> targeted locations to track basic block coverage. Each probe consists of a
> detour that diverts control flow to a designated trampoline. The trampoline
> updates coverage data using a single pc-relative mov instruction (e.g., mov
> BYTE PTR [rip+0xadd88],1), executes relocated instructions, and then
> restores control flow to its original state.
>
> To dump coverage data, the user can inject our small runtime library,
> libbcov-rt, using the LD_PRELOAD mechanism. This library scans the memory
> mappings of the current process to dump every data segment that starts with
> a specific magic number. This means that coverage data will be dumped
> separately for each ELF module. Dumping data takes place at process
> shutdown or upon receiving a user signal. The latter feature enables online
> coverage tracking.
>
>
> ## Key features
>
> bcov integrates several techniques including:
>
>  - Probe pruning. We instrument only what is necessary for tracking code
> coverage. To this end, we bring Agrawal's probe pruning technique [3] to
> binary-level instrumentation.
>
>  - Precise CFG analysis. Imprecision in the recovered CFG can have several
> effects, which include causing the instrumented binary to crash. To improve
> CFG precision, we propose a novel jump table analysis and implement a
> variant of an available non-return analysis [4].
>
>  - Static instrumentation. We implement several techniques like (1)
> optimized probe selection, (2) jump table instrumentation, (3) and
> systematic detour hosting.
>
> The result is that our approach can work transparently, and with low
> overhead, on large and well-tested C and C++ programs.
>
>
> ## Relation to the compiler
>
> Code coverage analysis at the binary level, as implemented in bcov, can
> offer several benefits: first, it is highly flexible. A user can analyze
> the coverage of a particular module only. Even better, coverage tracking
> can be limited to specific functions, e.g., the ones affected by recent
> changes. Second, it is largely independent of the used compiler and the
> build system. We observed consistent results after experimenting with four
> different versions of gcc and clang and three different build types,
> namely, debug, release, and LTO. Finally, the primitives we use to divert
> control-flow and update coverage are simple. One can imagine (and hope)
> that extending bcov to system code, or other native languages, will not
> require a lot of work.
>
> However, the viability of static binary-level coverage analysis ultimately
> depends on the compiler toolchain that produces the binary. Specifically,
> we can think of two areas where compilers might help our tool by adding
> extra support, or at least by not emitting code that is difficult to
> analyze:
>
>  - Source-level coverage. The ability to know whether a particular basic
> block was covered in the binary is quite interesting. However, developers
> would ultimately want to map this coverage information to source-level
> artifacts. Statement coverage can be supported already with the help of
> dwarf debugging information. But bcov can report the coverage of individual
> branch decisions at the binary level. Therefore, leveraging this
> information to obtain source-level MC/DC seems to be a natural next step.
> Ideally, we want to infer MC/DC using what we have already, namely,
> binary-level coverage and dwarf information. However, adding support to a
> custom mapping format similar to the one available in llvm-cov might be
> necessary. In this regard, we appreciate your suggestions on the best way
> to support MC/DC, if at all feasible. (feedback needed)
>
>  - Function model. We assume that a function occupies a continuous code
> region. The function's main entry is expected to be at the beginning of
> this region. Also, we consider each landing pad to be a function entry as
> well. Based on this, we assume that the identified entries predominate all
> basic blocks inside the function. In other words, a function can not reach
> an arbitrary basic block inside another function without first visiting one
> of its entries. This assumption seems to hold pretty well in the binaries
> we experimented with, but we are not sure to what extent it generalizes to
> other compilers and ISAs. (feedback needed)
>
>
> ## Potential future work
>
> Updating coverage using a single pc-relative mov instruction is
> transparent in the sense that it does not clobber any general-purpose
> register. It even preserves the CPU's eflags. This, in turn, has greatly
> simplified the implementation of our prototype. That is, we did not have to
> think about saving/restoring the CPU state. However, putting mechanisms in
> place to allow eflags to be *safely* clobbered, e.g., liveness analysis,
> can open the door for several design alternatives where a pc-relative mov
> (e.g., mov BYTE PTR [rip+0xadd88],1) can be replaced with:
>
>   -  or BYTE PTR [rip+0xadd88],c: where c is one-hot constant. This will
> reduce the size of required coverage data to 1/8 of the size currently
> required by bcov.
>
>   - add BYTE PTR [rip+0xadd88],1: 8-bit counter per superblock (or
> selected set of basic blocks). This might be useful for fuzzing.
>
>   - add DWORD PTR [rip+0xadd88],1: 32-bit counter per superblock (or
> selected set of basic blocks). This can be useful for profiling.
>
>
> So there is significant potential here. However, we have not explored
> these ideas any further, and we would highly appreciate any feedback about
> their viability and potential added value in comparison to what is already
> available in llvm-cov and sancov. (feedback needed)
>
> To conclude, we introduced bcov [1] and discussed a few areas where the
> LLVM community can help us assess its potential, identify related work, and
> plan its future direction. Many thanks in advance for your feedback.
>
>
> Kind regards,
>
> Ammar
>
>
>  [1]: https://github.com/abenkhadra/bcov
>  [2]: https://arxiv.org/pdf/2004.14191.pdf
>  [3]: https://dl.acm.org/doi/10.1145/174675.175935
>  [4]: https://dl.acm.org/doi/10.1145/2931037.2931047
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200627/2d229f1f/attachment.html>


More information about the llvm-dev mailing list