<div dir="ltr"><br><div><div>Hi Alan, thanks for pointing that out! I was not aware of this ongoing work.<br><br>I will have a deeper look at it to see how it can be complemented with binary-level coverage. Thanks again.<br><br>- Ammar <br></div><div><br></div><div></div></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Fri, Jun 26, 2020 at 10:41 PM Phipps, Alan <<a href="mailto:a-phipps@ti.com">a-phipps@ti.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div lang="EN-US">
<div class="gmail-m_-377541980520649968WordSection1">
<p class="MsoNormal"><span style="font-size:11pt;font-family:Calibri,sans-serif;color:rgb(31,73,125)">Hi Ammar, bcov sounds really interesting.<u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-size:11pt;font-family:Calibri,sans-serif;color:rgb(31,73,125)"><u></u> <u></u></span></p>
<p class="MsoNormal"><span style="font-size:11pt;font-family:Calibri,sans-serif;color:rgb(31,73,125)">I just completed implementation for LLVM source-based branch condition coverage, which extends llvm-cov visualization and makes use of (mostly) existing counter
instrumentation to track True/False execution counts for leaf-conditions comprising any source-level Boolean expression. I plan to upstream this support soon. MC/DC would also be the next logical step.<u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-size:11pt;font-family:Calibri,sans-serif;color:rgb(31,73,125)"><u></u> <u></u></span></p>
<p class="MsoNormal"><span style="font-size:11pt;font-family:Calibri,sans-serif;color:rgb(31,73,125)">Having a binary-level coverage tool is a great complement, and perhaps you can utilize the extensions I made llvm-cov for visualization.<u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-size:11pt;font-family:Calibri,sans-serif;color:rgb(31,73,125)"><u></u> <u></u></span></p>
<p class="MsoNormal"><span style="font-size:11pt;font-family:Calibri,sans-serif;color:rgb(31,73,125)">-Alan Phipps<u></u><u></u></span></p>
<p class="MsoNormal"><a name="m_-377541980520649968__MailEndCompose"><span style="font-size:11pt;font-family:Calibri,sans-serif;color:rgb(31,73,125)"><u></u> <u></u></span></a></p>
<p class="MsoNormal"><b><span style="font-size:10pt;font-family:Tahoma,sans-serif">From:</span></b><span style="font-size:10pt;font-family:Tahoma,sans-serif"> llvm-dev [mailto:<a href="mailto:llvm-dev-bounces@lists.llvm.org" target="_blank">llvm-dev-bounces@lists.llvm.org</a>]
<b>On Behalf Of </b>Ammar Ben Khadra via llvm-dev<br>
<b>Sent:</b> Friday, June 26, 2020 4:51 AM<br>
<b>To:</b> llvm-dev list<br>
<b>Subject:</b> [EXTERNAL] [llvm-dev] Introducing the binary-level coverage analysis tool bcov<u></u><u></u></span></p>
<p class="MsoNormal"><u></u> <u></u></p>
<div>
<p class="MsoNormal"><u></u> <u></u></p>
<div>
<p class="MsoNormal">## TL;DR<br>
<br>
We introduce bcov, an open-source binary-level coverage analysis tool [1]. The details are discussed in our paper [2], which is accepted to ESEC/FSE'20. bcov statically instruments x86-64 ELF binaries without compiler support. It features several techniques
that allow it to achieve high performance, transparency, and flexibility. <u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"><u></u> <u></u></p>
</div>
<div>
<p class="MsoNormal">For example, running "make check-llvm-codegen" with an instrumented version of llc introduces a mean performance overhead of about 32%. The overhead for smaller binaries can be as low as 2%. bcov accurately reports code coverage with a
mean F-score of 99,86%. All of that without introducing any test regressions.<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"><br>
<br>
## Design overview<br>
<br>
Given an ELF module as input, bcov will first analyze it to choose appropriate probe locations and estimate the size of the program segments required for patching. This analysis depends on the instrumentation policy chosen by the user. Then, bcov patches the
input module by adding two program segments; one segment contains trampoline code and the other keeps coverage data.<br>
<br>
bcov uses trampolines to instrument the binary where it inserts probes in targeted locations to track basic block coverage. Each probe consists of a detour that diverts control flow to a designated trampoline. The trampoline updates coverage data using a single
pc-relative mov instruction (e.g., mov BYTE PTR [rip+0xadd88],1), executes relocated instructions, and then restores control flow to its original state.<br>
<br>
To dump coverage data, the user can inject our small runtime library, libbcov-rt, using the LD_PRELOAD mechanism. This library scans the memory mappings of the current process to dump every data segment that starts with a specific magic number. This means that
coverage data will be dumped separately for each ELF module. Dumping data takes place at process shutdown or upon receiving a user signal. The latter feature enables online coverage tracking.<br>
<br>
<br>
## Key features<br>
<br>
bcov integrates several techniques including:<br>
<br>
- Probe pruning. We instrument only what is necessary for tracking code coverage. To this end, we bring Agrawal's probe pruning technique [3] to binary-level instrumentation.<br>
<br>
- Precise CFG analysis. Imprecision in the recovered CFG can have several effects, which include causing the instrumented binary to crash. To improve CFG precision, we propose a novel jump table analysis and implement a variant of an available non-return analysis
[4].<br>
<br>
- Static instrumentation. We implement several techniques like (1) optimized probe selection, (2) jump table instrumentation, (3) and systematic detour hosting.<br>
<br>
The result is that our approach can work transparently, and with low overhead, on large and well-tested C and C++ programs.<br>
<br>
<br>
## Relation to the compiler<br>
<br>
Code coverage analysis at the binary level, as implemented in bcov, can offer several benefits: first, it is highly flexible. A user can analyze the coverage of a particular module only. Even better, coverage tracking can be limited to specific functions, e.g.,
the ones affected by recent changes. Second, it is largely independent of the used compiler and the build system. We observed consistent results after experimenting with four different versions of gcc and clang and three different build types, namely, debug,
release, and LTO. Finally, the primitives we use to divert control-flow and update coverage are simple. One can imagine (and hope) that extending bcov to system code, or other native languages, will not require a lot of work.<br>
<br>
However, the viability of static binary-level coverage analysis ultimately depends on the compiler toolchain that produces the binary. Specifically, we can think of two areas where compilers might help our tool by adding extra support, or at least by not emitting
code that is difficult to analyze:<br>
<br>
- Source-level coverage. The ability to know whether a particular basic block was covered in the binary is quite interesting. However, developers would ultimately want to map this coverage information to source-level artifacts. Statement coverage can be supported
already with the help of dwarf debugging information. But bcov can report the coverage of individual branch decisions at the binary level. Therefore, leveraging this information to obtain source-level MC/DC seems to be a natural next step. Ideally, we want
to infer MC/DC using what we have already, namely, binary-level coverage and dwarf information. However, adding support to a custom mapping format similar to the one available in llvm-cov might be necessary. In this regard, we appreciate your suggestions on
the best way to support MC/DC, if at all feasible. (feedback needed)<br>
<br>
- Function model. We assume that a function occupies a continuous code region. The function's main entry is expected to be at the beginning of this region. Also, we consider each landing pad to be a function entry as well. Based on this, we assume that the
identified entries predominate all basic blocks inside the function. In other words, a function can not reach an arbitrary basic block inside another function without first visiting one of its entries. This assumption seems to hold pretty well in the binaries
we experimented with, but we are not sure to what extent it generalizes to other compilers and ISAs. (feedback needed)<br>
<br>
<br>
## Potential future work<br>
<br>
Updating coverage using a single pc-relative mov instruction is transparent in the sense that it does not clobber any general-purpose register. It even preserves the CPU's eflags. This, in turn, has greatly simplified the implementation of our prototype. That
is, we did not have to think about saving/restoring the CPU state. However, putting mechanisms in place to allow eflags to be *safely* clobbered, e.g., liveness analysis, can open the door for several design alternatives where a pc-relative mov (e.g., mov
BYTE PTR [rip+0xadd88],1) can be replaced with:<br>
<br>
- or BYTE PTR [rip+0xadd88],c: where c is one-hot constant. This will reduce the size of required coverage data to 1/8 of the size currently required by bcov.<br>
<br>
- add BYTE PTR [rip+0xadd88],1: 8-bit counter per superblock (or selected set of basic blocks). This might be useful for fuzzing.<br>
<br>
- add DWORD PTR [rip+0xadd88],1: 32-bit counter per superblock (or selected set of basic blocks). This can be useful for profiling.<br>
<br>
<br>
So there is significant potential here. However, we have not explored these ideas any further, and we would highly appreciate any feedback about their viability and potential added value in comparison to what is already available in llvm-cov and sancov. (feedback
needed)<br>
<br>
To conclude, we introduced bcov [1] and discussed a few areas where the LLVM community can help us assess its potential, identify related work, and plan its future direction. Many thanks in advance for your feedback.<br>
<br>
<br>
Kind regards,<br>
<br>
Ammar<br>
<br>
<br>
[1]: <a href="https://github.com/abenkhadra/bcov" target="_blank">https://github.com/abenkhadra/bcov</a><br>
[2]: <a href="https://arxiv.org/pdf/2004.14191.pdf" target="_blank">https://arxiv.org/pdf/2004.14191.pdf</a><br>
[3]: <a href="https://dl.acm.org/doi/10.1145/174675.175935" target="_blank">https://dl.acm.org/doi/10.1145/174675.175935</a><br>
[4]: <a href="https://dl.acm.org/doi/10.1145/2931037.2931047" target="_blank">https://dl.acm.org/doi/10.1145/2931037.2931047</a><u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"><u></u> <u></u></p>
</div>
</div>
</div>
</div>
</blockquote></div>