[llvm-dev] RFC: Comprehensive Static Instrumentation

TB Schardl via llvm-dev llvm-dev at lists.llvm.org
Thu Jun 16 09:14:51 PDT 2016

CC'ing the mailing list for the CSI project.

On Thu, Jun 16, 2016 at 12:01 PM, TB Schardl <neboat at mit.edu> wrote:

> Hey LLVM-dev,
> We propose to build the CSI framework to provide a comprehensive suite of
> compiler-inserted instrumentation hooks that dynamic-analysis tools can use
> to observe and investigate program runtime behavior.  Traditionally, tools
> based on compiler instrumentation would each separately modify the compiler
> to insert their own instrumentation.  In contrast, CSI inserts a standard
> collection of instrumentation hooks into the program-under-test.  Each
> CSI-tool is implemented as a library that defines relevant hooks, and the
> remaining hooks are "nulled" out and elided during link-time optimization
> (LTO), resulting in instrumented runtimes on par with custom
> instrumentation.  CSI allows many compiler-based tools to be written as
> simple libraries without modifying the compiler, greatly lowering the bar
> for
> developing dynamic-analysis tools.
> ================
> Motivation
> ================
> Key to understanding and improving the behavior of any system is
> visibility -- the ability to know what is going on inside the system.
> Various dynamic-analysis tools, such as race detectors, memory checkers,
> cache simulators, call-graph generators, code-coverage analyzers, and
> performance profilers, rely on compiler instrumentation to gain visibility
> into the program behaviors during execution.  With this approach, the tool
> writer modifies the compiler to insert instrumentation code into the
> program-under-test so that it can execute behind the scene while the
> program-under-test runs.  This approach, however, means that the
> development of new tools requires compiler work, which many potential tool
> writers are ill equipped to do, and thus raises the bar for building new
> and innovative tools.
> The goal of the CSI framework is to provide comprehensive static
> instrumentation through the compiler, in order to simplify the task of
> building efficient and effective platform-independent tools.  The CSI
> framework allows the tool writer to easily develop analysis tools that
> require
> compiler instrumentation without needing to understand the compiler
> internals or modifying the compiler, which greatly lowers the bar for
> developing dynamic-analysis tools.
> ================
> Approach
> ================
> The CSI framework inserts instrumentation hooks at salient locations
> throughout the compiled code of a program-under-test, such as function
> entry and exit points, basic-block entry and exit points, before and after
> each memory operation, etc.  Tool writers can instrument a
> program-under-test simply by first writing a library that defines the
> semantics of relevant hooks
> and then statically linking their compiled library with the
> program-under-test.
> At first glance, this brute-force method of inserting hooks at every
> salient location in the program-under-test seems to be replete with
> overheads.  CSI overcomes these overheads through the use of
> link-time-optimization (LTO), which is now readily available in most major
> compilers, including GCC and LLVM.  Using LTO, instrumentation hooks that
> are not used by a particular tool can be elided, allowing the overheads of
> these hooks to be avoided when the
> instrumented program-under-test is run.  Furthermore, LTO can optimize a
> tool's instrumentation within a program using traditional compiler
> optimizations.  Our initial study indicates that the use of LTO does not
> unduly slow down the build time, and the LTO can indeed optimize away
> unused hooks.  One of our experiments with Apache HTTP server shows that,
> compiling with CSI and linking with the "null" CSI-tool (which consists
> solely of empty hooks) slows down the build time of the Apache HTTP server
> by less than 40%, and the resulting tool-instrumented executable is as fast
> as the original uninstrumented code.
> ================
> CSI version 1
> ================
> The initial version of CSI supports a basic set of hooks that covers the
> following categories of program objects: functions, function exits
> (specifically, returns), basic blocks, call sites, loads, and stores.  We
> prioritized instrumenting these IR objects based on the need of seven
> example CSI tools, including a race detector, a cache-reuse analyzer, and a
> code-coverage analyzer.  We plan to evolve the CSI API over time to be more
> comprehensive, and we have designed the CSI API to be extensible, allowing
> new instrumentation to be added as needs grow.  We chose to initially
> implement a minimal "core" set of hooks, because we felt it was best to add
> new instrumentation on an as-needed basis in order to keep the interface
> simple.
> There are three salient features about the design of CSI.  First, CSI
> assigns each instrumented program object a unique integer identifier within
> one of the (currently) six program-object categories.  Within each
> category, the ID's are consecutively numbered from 0 up to the number of
> such objects minus 1.  The contiguous assignment of the ID's allows the
> tool writer to easily keep track of IR objects in the program and iterate
> through all objects in a category (whether the object is encountered during
> execution or not).  Second, CSI provides a platform-independent means to
> relate a given program object to locations in the source code.
> Specifically, CSI provides "front-end-data (FED)" tables, which provide
> file name and source lines for each program object given the object's ID.
> Third, each CSI hook takes in as a parameter a "property": a 64-bit
> unsigned integer that CSI uses to export the results of compiler analyses
> and other information known at compile time.  The use of properties allow
> the tool to rely on compiler analyses to optimize instrumentation and
> decrease overhead.  In particular, since the value of a property is known
> at compile time, LTO can constant-fold the conditional test around a
> property to elide unnecessary instrumentation.
> ================
> Future plan
> ================
> We plan to expand CSI in future versions by instrumenting additional
> program objects, such as atomic instructions, floating-point instructions,
> and exceptions.  We are also planning to provide additional static
> information to tool writers, both through information encoded in the
> properties passed to hooks and by other means.  In particular, we are also
> looking at mechanisms to present tool writers with more complex static
> information, such as how different program objects relate to each other,
> e.g., which basic blocks belong to a given function.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160616/22de2d66/attachment.html>

More information about the llvm-dev mailing list