[llvm-dev] RFC: Comprehensive Static Instrumentation

Fri Jun 17 08:17:39 PDT 2016

On Fri, Jun 17, 2016 at 9:29 AM, Craig, Ben via llvm-dev <
llvm-dev at lists.llvm.org> wrote:

> On 6/16/2016 2:48 PM, Mehdi Amini via llvm-dev wrote:
>
>
> On Jun 16, 2016, at 9:01 AM, TB Schardl via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
> The CSI framework inserts instrumentation hooks at salient locations
> throughout the compiled code of a program-under-test, such as function
> entry and exit points, basic-block entry and exit points, before and after
> each memory operation, etc.  Tool writers can instrument a
> program-under-test simply by first writing a library that defines the
> semantics of relevant hooks
> and then statically linking their compiled library with the
> program-under-test.
>
> At first glance, this brute-force method of inserting hooks at every
> salient location in the program-under-test seems to be replete with
> overheads.  CSI overcomes these overheads through the use of
> link-time-optimization (LTO), which is now readily available in most major
> compilers, including GCC and LLVM.  Using LTO, instrumentation hooks that
> are not used by a particular tool can be elided, allowing the overheads of
> these hooks to be avoided when the
>
>
> I don't understand this flow: the front-end emits all the possible
> instrumentation but the useless calls to the runtime will be removed during
> the link?
> It means that the final binary is specialized for a given tool right? What
> is the advantage of generating this useless instrumentation in the first
> place then? I'm missing a piece here...
>
> Suppose I want to build a production build, and one build for each of
> ASAN, MSAN, UBSAN, and TSAN.
>
> With the current approach, I need to compile my source five different
> times, and link five different times.
>
> With the CSI approach (assuming it was the backing technology behind the
> sanitizers), I need to compile twice (once for production, once for
> instrumentation), then LTO-link five times.  I can reuse my .o files across
> the sanitizer types.
>

> It's possible that the math doesn't really work out in practice if the
> cost of the LTO-link dwarfs the compile times.
>

Other than the build time, we should also consider the performance of the
produced binary, which might be more important.
I have hard time to believe that the LTO-link optimized (CSI version 1)
binary could beat the original ASan IR instrumentation based binary.
With IR instrumentation, the binary could benefit both from problem
specific domain knowledge and comprehensive compiler optimizations:
e.g., inline small code without context switch, skip redundant load/store
instrumentation, and  more aggressive optimization since the compiler sees
everything.
I am not sure if CSI could do any of them.
IMHO, CSI might be good for fast prototype research work and may fall short
when we are really serious about the performance.

>
> --
> Employee of Qualcomm Innovation Center, Inc.
> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160617/215cd041/attachment.html>