[llvm-dev] RFC: EfficiencySanitizer

Wed Apr 20 04:58:31 PDT 2016

Hi Derek,

I'm not an expert in any of these topics, but I'm excited that you
guys are doing it. It seems like a missing piece that needs to be
filled.

Some comments inline...

On 17 April 2016 at 22:46, Derek Bruening via llvm-dev
<llvm-dev at lists.llvm.org> wrote:
> We would prefer to trade off accuracy and build a
> less-accurate tool below our overhead ceiling than to build a high-accuracy
> but slow tool.

I agree with this strategy.

As a first approach, making the fastest you can, then later
introducing more probes, maybe via some slider flag (like -ON) to
consciously trade speed for accuracy.

> Studying instruction cache behavior with compiler
> instrumentation can be challenging, however, so we plan to at least
> initially focus on data performance.

I'm interested in how you're going to do this without kernel profiling
probes, like perf.

Or is the point here introducing syscalls in the right places instead
of randomly profiled? Wouldn't that bias your results?

> Many of our planned tools target specific performance issues with data
> accesses.  They employ the technique of *shadow memory* to store metadata
> about application data references, using the compiler to instrument loads
> and stores with code to update the shadow memory.

Is it just counting the number of reads/writes? Or are you going to
add how many of those accesses were hit by a cache miss?

> *Cache fragmentation*: this tool gather data structure field hotness
> information, looking for data layout optimization opportunities by grouping
> hot fields together to avoid data cache fragmentation.  Future enhancements
> may add field affinity information if it can be computed with low enough
> overhead.

Would be also good to have temporal information, so that you can
correlate data access that occurs, for example, inside the same loop /
basic block, or in sequence in the common CFG flow. This could lead to
change in allocation patterns (heap, BSS).

> *Working set measurement*: this tool measures the data working set size of
> an application at each snapshot during execution.  It can help to understand
> phased behavior as well as providing basic direction for further effort by
> the developer: e.g., knowing whether the working set is close to fitting in
> current L3 caches or is many times larger can help determine where to spend
> effort.

This is interesting, but most useful when your dataset changes size
over different runs. This is similar to running the program under perf
for different workloads, and I'm not sure how you're going to get that
in a single run. It also comes with the additional problem that cache
sizes are not always advertised, so you might have an additional tool
to guess the sizes based on increasing the size of data blocks and
finding steps on the data access graph.

> *Dead store detection*: this tool identifies dead stores (write-after-write
> patterns with no intervening read) as well as redundant stores (writes of
> the same value already in memory).  Xref the Deadspy paper from CGO 2012.

This should probably be spotted by the compiler, so I guess it's a
tool for compiler developers to spot missed optimisation opportunities
in the back-end.

> *Single-reference*: this tool identifies data cache lines brought in but
> only read once.  These could be candidates for non-temporal loads.

That's nice and should be simple enough to get a report in the end.
This also seem to be a hint to compiler developers rather than users.

I think you guys have a nice set of tools to develop and I'm looking
forward to working with them.

cheers,
--renato