[llvm-dev] RFC: EfficiencySanitizer

Sun Apr 17 14:46:29 PDT 2016

TL;DR: We plan to build a suite of compiler-based dynamic instrumentation
tools for analyzing targeted performance problems.  These tools will all
live under a new "EfficiencySanitizer" (or "esan") sanitizer umbrella, as
they will share significant portions of their implementations.

====================
Motivation
====================

Our goal is to build a suite of dynamic instrumentation tools for analyzing
particular performance problems that are difficult to evaluate using other
profiling methods.  Modern hardware performance counters provide insight
into where time is spent and when micro-architectural events such as cache
misses are occurring, but they are of limited effectiveness for contextual
analysis: it is not easy to answer *why* a cache miss occurred.

Examples of tools that we have planned include: identifying wasted or
redundant computation, identifying cache fragmentation, and measuring
working sets.  See more details on these below.

====================
Approach
====================

We believe that tools with overhead beyond about 5x are simply too
heavyweight to easily apply to large, industrial-sized applications running
real-world workloads.  Our goal is for our tools to gather useful
information with overhead less than 5x, and ideally closer to 3x, to
facilitate deployment.  We would prefer to trade off accuracy and build a
less-accurate tool below our overhead ceiling than to build a high-accuracy
but slow tool.  We hope to hit a sweet spot of tools that gather
trace-based contextual information not feasible with pure sampling yet are
still practical to deploy.

In a similar vein, we would prefer a targeted tool that analyzes one
particular aspect of performance with low overhead than a more general tool
that can answer more questions but has high overhead.

Dynamic binary instrumentation is one option for these types of tools, but
typically compiler-based instrumentation provides better performance, and
we intend to focus only on analyzing applications for which source code is
available.  Studying instruction cache behavior with compiler
instrumentation can be challenging, however, so we plan to at least
initially focus on data performance.

Many of our planned tools target specific performance issues with data
accesses.  They employ the technique of *shadow memory* to store metadata
about application data references, using the compiler to instrument loads
and stores with code to update the shadow memory.  A companion runtime
library intercepts libc calls if necessary to update shadow memory on
non-application data references.  The runtime library also intercepts heap
allocations and other key events in order to perform its analyses.  This is
all very similar to how existing sanitizers such as AddressSanitizer,
ThreadSanitizer, MemorySanitizer, etc. operate today.

====================
Example Tools
====================

We have several initial tools that we plan to build.  These are not
necessarily novel ideas on their own: some of these have already been
explored in academia.  The idea is to create practical, low-overhead,
robust, and publicly available versions of these tools.

*Cache fragmentation*: this tool gather data structure field hotness
information, looking for data layout optimization opportunities by grouping
hot fields together to avoid data cache fragmentation.  Future enhancements
may add field affinity information if it can be computed with low enough
overhead.

*Working set measurement*: this tool measures the data working set size of
an application at each snapshot during execution.  It can help to
understand phased behavior as well as providing basic direction for further
effort by the developer: e.g., knowing whether the working set is close to
fitting in current L3 caches or is many times larger can help determine
where to spend effort.

*Dead store detection*: this tool identifies dead stores (write-after-write
patterns with no intervening read) as well as redundant stores (writes of
the same value already in memory).  Xref the Deadspy paper from CGO 2012.

*Single-reference*: this tool identifies data cache lines brought in but
only read once.  These could be candidates for non-temporal loads.

====================
EfficiencySanitizer
====================

We are proposing the name EfficiencySanitizer, or "esan" for short, to
refer to this suite of dynamic instrumentation tools for improving program
efficiency.  As we have a number of different tools that share quite a bit
of their implementation we plan to consider them sub-tools under the
EfficiencySanitizer umbrella, rather than adding a whole bunch of separate
instrumentation and runtime library components.

While these tools are not addressing correctness issues like other
sanitizers, they will be sharing a lot of the existing sanitizer runtime
library support code.  Furthermore, users are already familiar with the
sanitizer brand, and it seems better to extend that concept rather than add
some new term.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160417/21831feb/attachment.html>