[llvm-dev] RFC: Comprehensive Static Instrumentation

Thu Jun 16 12:51:44 PDT 2016

Dear TB,

Comments inline below.

On 6/16/16 11:01 AM, TB Schardl via llvm-dev wrote:
>
> Hey LLVM-dev,
>
>
> We propose to build the CSI framework to provide a comprehensive suite 
> of compiler-inserted instrumentation hooks that dynamic-analysis tools 
> can use to observe and investigate program runtime behavior. 
> Traditionally, tools based on compiler instrumentation would each 
> separately modify the compiler to insert their own instrumentation.  
> In contrast, CSI inserts a standard collection of instrumentation 
> hooks into the program-under-test.  Each CSI-tool is implemented as a 
> library that defines relevant hooks, and the remaining hooks are 
> "nulled" out and elided during link-time optimization (LTO), resulting 
> in instrumented runtimes on par with custom instrumentation.  CSI 
> allows many compiler-based tools to be written as simple libraries 
> without modifying the compiler, greatly lowering the bar for
>
> developing dynamic-analysis tools.
>

Can you clarify the scope of tools that you want to develop?  Are these 
profiling tools, security enforcement tools, debugging tools, etc?  The 
type of tools you want to build will dictate whether such a framework 
makes sense.

>
> ================
>
> Motivation
>
> ================
>
>
> Key to understanding and improving the behavior of any system is 
> visibility -- the ability to know what is going on inside the system.  
> Various dynamic-analysis tools, such as race detectors, memory 
> checkers, cache simulators, call-graph generators, code-coverage 
> analyzers, and performance profilers, rely on compiler instrumentation 
> to gain visibility into the program behaviors during execution.  With 
> this approach, the tool writer modifies the compiler to insert 
> instrumentation code into the program-under-test so that it can 
> execute behind the scene while the program-under-test runs.  This 
> approach, however, means that the development of new tools requires 
> compiler work, which many potential tool writers are ill equipped to 
> do, and thus raises the bar for building new and innovative tools.
>

> The goal of the CSI framework is to provide comprehensive static 
> instrumentation through the compiler, in order to simplify the task of 
> building efficient and effective platform-independent tools.  The CSI 
> framework allows the tool writer to easily develop analysis tools that 
> require
>
> compiler instrumentation without needing to understand the compiler 
> internals or modifying the compiler, which greatly lowers the bar for 
> developing dynamic-analysis tools.
>
>
> ================
>
> Approach
>
> ================
>
>
> The CSI framework inserts instrumentation hooks at salient locations 
> throughout the compiled code of a program-under-test, such as function 
> entry and exit points, basic-block entry and exit points, before and 
> after each memory operation, etc.  Tool writers can instrument a 
> program-under-test simply by first writing a library that defines the 
> semantics of relevant hooks
>
> and then statically linking their compiled library with the 
> program-under-test.
>
>
> At first glance, this brute-force method of inserting hooks at every 
> salient location in the program-under-test seems to be replete with 
> overheads. CSI overcomes these overheads through the use of 
> link-time-optimization (LTO), which is now readily available in most 
> major compilers, including GCC and LLVM.  Using LTO, instrumentation 
> hooks that are not used by a particular tool can be elided, allowing 
> the overheads of these hooks to be avoided when the
>
> instrumented program-under-test is run.
>

The algorithms for optimizing away run-time hooks is not necessarily 
uniform.  For example, if a tool instruments loads and stores to collect 
the set of memory locations accessed by a program, then optimizing away 
a redundant check on a store is okay.  If the instrumentation is meant 
to enforce memory safety, then redundant checks can only be optimized 
away if there is no intervening call to free() between the two checks 
(which may require inter-procedural analysis to determine).  In such a 
case, you either need to be very pessimistic about the optimizations 
that you use, or you will get incorrect optimizations for certain 
classes of dynamic analyses.

> Furthermore, LTO can optimize a tool's instrumentation within a 
> program using traditional compiler optimizations.  Our initial study 
> indicates that the use of LTO does not unduly slow down the build 
> time, and the LTO can indeed optimize away unused hooks.  One of our 
> experiments with Apache HTTP server shows that, compiling with CSI and 
> linking with the "null" CSI-tool (which consists solely of empty 
> hooks) slows down the build time of the Apache HTTP server by less 
> than 40%, and the resulting tool-instrumented executable is as fast as 
> the original uninstrumented code.
>

Is your intention to have a compiler flag that enables insertions of 
hooks, or are you planning on having a pass always add the hooks and 
having libLTO always remove them?  I assume the former, but you should 
probably clarify.

>
> ================
>
> CSI version 1
>
> ================
>
>
> The initial version of CSI supports a basic set of hooks that covers 
> the following categories of program objects: functions, function exits 
> (specifically, returns), basic blocks, call sites, loads, and stores.
>

Don't forget that atomic instructions (e.g., compare-and-swap) and some 
of the intrinsics (e.g., llvm.memcpy()) also access memory.

> We prioritized instrumenting these IR objects based on the need of 
> seven example CSI tools, including a race detector, a cache-reuse 
> analyzer, and a code-coverage analyzer.  We plan to evolve the CSI API 
> over time to be more comprehensive, and we have designed the CSI API 
> to be extensible, allowing new instrumentation to be added as needs 
> grow.  We chose to initially implement a minimal "core" set of hooks, 
> because we felt it was best to add new instrumentation on an as-needed 
> basis in order to keep the interface simple.
>
>
> There are three salient features about the design of CSI.  First, CSI 
> assigns each instrumented program object a unique integer identifier 
> within one of the (currently) six program-object categories.  Within 
> each category, the ID's are consecutively numbered from 0 up to the 
> number of such objects minus 1.  The contiguous assignment of the ID's 
> allows the tool writer to easily keep track of IR objects in the 
> program and iterate through all objects in a category (whether the 
> object is encountered during execution or not).  Second, CSI provides 
> a platform-independent means to relate a given program object to 
> locations in the source code. Specifically, CSI provides 
> "front-end-data (FED)" tables, which provide file name and source 
> lines for each program object given the object's ID.  Third, each CSI 
> hook takes in as a parameter a "property": a 64-bit unsigned integer 
> that CSI uses to export the results of compiler analyses and other 
> information known at compile time.  The use of properties allow the 
> tool to rely on compiler analyses to optimize instrumentation and 
> decrease overhead.  In particular, since the value of a property is 
> known at compile time, LTO can constant-fold the conditional test 
> around a property to elide unnecessary instrumentation.
>
>
> ================
>
> Future plan
>
> ================
>
>
> We plan to expand CSI in future versions by instrumenting additional 
> program objects, such as atomic instructions, floating-point 
> instructions, and exceptions.  We are also planning to provide 
> additional static information to tool writers, both through 
> information encoded in the properties passed to hooks and by other 
> means.  In particular, we are also looking at mechanisms to present 
> tool writers with more complex static information, such as how 
> different program objects relate to each other, e.g., which basic 
> blocks belong to a given function.
>

As an aside, I'm not sure that I buy the idea that tool developers 
should be oblivious to the compiler internals.  If a tool developer 
doesn't understand what the compiler is doing, then she/he may not 
understand the results of the output.  For example, LLVM load/stores do 
not include stack spill slots, memory accesses incurred by calling 
conventions, etc.  Depending on where the instrumentation passes are 
placed in the pass pipeline, instrumentation calls can be moved or 
removed (perhaps in undesirable ways for some dynamic analysis 
applications).

I would also argue that a key design feature of LLVM is to make writing 
such passes simple, and I think LLVM accomplishes this.  If one 
understands how to build an efficient dynamic analysis, one can probably 
handle writing the compiler passes.

Overall, I think having common instrumentation infrastructure is a good 
thing.  However, I'm not sure how common it can be across different 
applications of instrumentation.  As an example, most memory safety 
solutions have the same instrumentation and optimization needs (and 
constraints on optimization) regardless of how they implement the checks 
on loads and stores.  However, it's less clear to me that a race 
detector and a memory safety compiler can safely use the same 
optimizations.  You may find yourself implementing a common 
infrastructure with each tool implementing specialized optimizations to 
make each type of dynamic analysis really fast.

Food for thought,

John Criswell

>
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

-- 
John Criswell
Assistant Professor
Department of Computer Science, University of Rochester
http://www.cs.rochester.edu/u/criswell

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160616/6f79aa3a/attachment-0001.html>