[llvm-dev] RFC: Comprehensive Static Instrumentation

Thu Jun 16 11:23:23 PDT 2016

I am very glad this project reached the state where we can start the public
code review. Please shoot the patches for review when ready.

--kcc

On Thu, Jun 16, 2016 at 12:14 PM, TB Schardl via llvm-dev <
llvm-dev at lists.llvm.org> wrote:

> CC'ing the mailing list for the CSI project.
>
> On Thu, Jun 16, 2016 at 12:01 PM, TB Schardl <neboat at mit.edu> wrote:
>
>> Hey LLVM-dev,
>>
>>
>> We propose to build the CSI framework to provide a comprehensive suite of
>> compiler-inserted instrumentation hooks that dynamic-analysis tools can use
>> to observe and investigate program runtime behavior.  Traditionally, tools
>> based on compiler instrumentation would each separately modify the compiler
>> to insert their own instrumentation.  In contrast, CSI inserts a standard
>> collection of instrumentation hooks into the program-under-test.  Each
>> CSI-tool is implemented as a library that defines relevant hooks, and the
>> remaining hooks are "nulled" out and elided during link-time optimization
>> (LTO), resulting in instrumented runtimes on par with custom
>> instrumentation.  CSI allows many compiler-based tools to be written as
>> simple libraries without modifying the compiler, greatly lowering the bar
>> for
>>
>> developing dynamic-analysis tools.
>>
>> ================
>>
>> Motivation
>>
>> ================
>>
>> Key to understanding and improving the behavior of any system is
>> visibility -- the ability to know what is going on inside the system.
>> Various dynamic-analysis tools, such as race detectors, memory checkers,
>> cache simulators, call-graph generators, code-coverage analyzers, and
>> performance profilers, rely on compiler instrumentation to gain visibility
>> into the program behaviors during execution.  With this approach, the tool
>> writer modifies the compiler to insert instrumentation code into the
>> program-under-test so that it can execute behind the scene while the
>> program-under-test runs.  This approach, however, means that the
>> development of new tools requires compiler work, which many potential tool
>> writers are ill equipped to do, and thus raises the bar for building new
>> and innovative tools.
>>
>> The goal of the CSI framework is to provide comprehensive static
>> instrumentation through the compiler, in order to simplify the task of
>> building efficient and effective platform-independent tools.  The CSI
>> framework allows the tool writer to easily develop analysis tools that
>> require
>>
>> compiler instrumentation without needing to understand the compiler
>> internals or modifying the compiler, which greatly lowers the bar for
>> developing dynamic-analysis tools.
>>
>> ================
>>
>> Approach
>>
>> ================
>>
>> The CSI framework inserts instrumentation hooks at salient locations
>> throughout the compiled code of a program-under-test, such as function
>> entry and exit points, basic-block entry and exit points, before and after
>> each memory operation, etc.  Tool writers can instrument a
>> program-under-test simply by first writing a library that defines the
>> semantics of relevant hooks
>>
>> and then statically linking their compiled library with the
>> program-under-test.
>>
>> At first glance, this brute-force method of inserting hooks at every
>> salient location in the program-under-test seems to be replete with
>> overheads.  CSI overcomes these overheads through the use of
>> link-time-optimization (LTO), which is now readily available in most major
>> compilers, including GCC and LLVM.  Using LTO, instrumentation hooks that
>> are not used by a particular tool can be elided, allowing the overheads of
>> these hooks to be avoided when the
>>
>> instrumented program-under-test is run.  Furthermore, LTO can optimize a
>> tool's instrumentation within a program using traditional compiler
>> optimizations.  Our initial study indicates that the use of LTO does not
>> unduly slow down the build time, and the LTO can indeed optimize away
>> unused hooks.  One of our experiments with Apache HTTP server shows that,
>> compiling with CSI and linking with the "null" CSI-tool (which consists
>> solely of empty hooks) slows down the build time of the Apache HTTP server
>> by less than 40%, and the resulting tool-instrumented executable is as fast
>> as the original uninstrumented code.
>>
>> ================
>>
>> CSI version 1
>>
>> ================
>>
>> The initial version of CSI supports a basic set of hooks that covers the
>> following categories of program objects: functions, function exits
>> (specifically, returns), basic blocks, call sites, loads, and stores.  We
>> prioritized instrumenting these IR objects based on the need of seven
>> example CSI tools, including a race detector, a cache-reuse analyzer, and a
>> code-coverage analyzer.  We plan to evolve the CSI API over time to be more
>> comprehensive, and we have designed the CSI API to be extensible, allowing
>> new instrumentation to be added as needs grow.  We chose to initially
>> implement a minimal "core" set of hooks, because we felt it was best to add
>> new instrumentation on an as-needed basis in order to keep the interface
>> simple.
>>
>> There are three salient features about the design of CSI.  First, CSI
>> assigns each instrumented program object a unique integer identifier within
>> one of the (currently) six program-object categories.  Within each
>> category, the ID's are consecutively numbered from 0 up to the number of
>> such objects minus 1.  The contiguous assignment of the ID's allows the
>> tool writer to easily keep track of IR objects in the program and iterate
>> through all objects in a category (whether the object is encountered during
>> execution or not).  Second, CSI provides a platform-independent means to
>> relate a given program object to locations in the source code.
>> Specifically, CSI provides "front-end-data (FED)" tables, which provide
>> file name and source lines for each program object given the object's ID.
>> Third, each CSI hook takes in as a parameter a "property": a 64-bit
>> unsigned integer that CSI uses to export the results of compiler analyses
>> and other information known at compile time.  The use of properties allow
>> the tool to rely on compiler analyses to optimize instrumentation and
>> decrease overhead.  In particular, since the value of a property is known
>> at compile time, LTO can constant-fold the conditional test around a
>> property to elide unnecessary instrumentation.
>>
>> ================
>>
>> Future plan
>>
>> ================
>>
>> We plan to expand CSI in future versions by instrumenting additional
>> program objects, such as atomic instructions, floating-point instructions,
>> and exceptions.  We are also planning to provide additional static
>> information to tool writers, both through information encoded in the
>> properties passed to hooks and by other means.  In particular, we are also
>> looking at mechanisms to present tool writers with more complex static
>> information, such as how different program objects relate to each other,
>> e.g., which basic blocks belong to a given function.
>>
>>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160616/7603fa0e/attachment.html>