[llvm-dev] RFC: EfficiencySanitizer

Wed Apr 20 16:50:31 PDT 2016

On Wed, Apr 20, 2016 at 7:58 AM, Renato Golin <renato.golin at linaro.org>
wrote:

> On 17 April 2016 at 22:46, Derek Bruening via llvm-dev
> > Studying instruction cache behavior with compiler
> > instrumentation can be challenging, however, so we plan to at least
> > initially focus on data performance.
>
> I'm interested in how you're going to do this without kernel profiling
> probes, like perf.
>
> Or is the point here introducing syscalls in the right places instead
> of randomly profiled? Wouldn't that bias your results?
>

I'm not sure I understand the question: are you asking whether not
gathering data on time spent in the kernel is an issue?  Or you're asking
how to measure aspects of performance without using sampling or hardware
performance counters?

> Many of our planned tools target specific performance issues with data
> > accesses.  They employ the technique of *shadow memory* to store metadata
> > about application data references, using the compiler to instrument loads
> > and stores with code to update the shadow memory.
>
> Is it just counting the number of reads/writes? Or are you going to
> add how many of those accesses were hit by a cache miss?
>

It varies by tool.  The brief descriptions in the original email hopefully
shed some light; we are also sending separate RFC's for each tool (working
set was already sent).  The cache frag tool is basically just counting,
yes.  There is no cache miss information here: we are not using hardware
perf counters nor running a software cache simulation.  We are measuring
particular aspects of application behavior that tend to affect performance,
often abstracted away from the precise microarchitecture you're running on.

> *Cache fragmentation*: this tool gather data structure field hotness
> > information, looking for data layout optimization opportunities by
> grouping
> > hot fields together to avoid data cache fragmentation.  Future
> enhancements
> > may add field affinity information if it can be computed with low enough
> > overhead.
>
> Would be also good to have temporal information, so that you can
> correlate data access that occurs, for example, inside the same loop /
> basic block, or in sequence in the common CFG flow. This could lead to
> change in allocation patterns (heap, BSS).
>

Agreed, we have thought about adding temporal information, though it would
cost more and we have not flushed out the details.

> > *Working set measurement*: this tool measures the data working set size
> of
> > an application at each snapshot during execution.  It can help to
> understand
> > phased behavior as well as providing basic direction for further effort
> by
> > the developer: e.g., knowing whether the working set is close to fitting
> in
> > current L3 caches or is many times larger can help determine where to
> spend
> > effort.
>
> This is interesting, but most useful when your dataset changes size
> over different runs. This is similar to running the program under perf
> for different workloads, and I'm not sure how you're going to get that
> in a single run. It also comes with the additional problem that cache
> sizes are not always advertised, so you might have an additional tool
> to guess the sizes based on increasing the size of data blocks and
> finding steps on the data access graph.
>

This tool is relatively agnostic of the precise details of the caches
beyond having its granularity based on the cache line size it assumes (64
bytes, can be parametrized).
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160420/94074bda/attachment.html>