[llvm-dev] RFC: EfficiencySanitizer

Wed Apr 20 05:18:46 PDT 2016

On 04/20/2016 02:58 PM, Renato Golin via llvm-dev wrote:
> Hi Derek,
>
> I'm not an expert in any of these topics, but I'm excited that you
> guys are doing it. It seems like a missing piece that needs to be
> filled.
>
> Some comments inline...
>
>
> On 17 April 2016 at 22:46, Derek Bruening via llvm-dev
> <llvm-dev at lists.llvm.org> wrote:
>> We would prefer to trade off accuracy and build a
>> less-accurate tool below our overhead ceiling than to build a high-accuracy
>> but slow tool.
>
> I agree with this strategy.
>
> As a first approach, making the fastest you can, then later
> introducing more probes, maybe via some slider flag (like -ON) to
> consciously trade speed for accuracy.
>
>
>> Studying instruction cache behavior with compiler
>> instrumentation can be challenging, however, so we plan to at least
>> initially focus on data performance.
>
> I'm interested in how you're going to do this without kernel profiling
> probes, like perf.
>
> Or is the point here introducing syscalls in the right places instead
> of randomly profiled? Wouldn't that bias your results?
>
>
>> Many of our planned tools target specific performance issues with data
>> accesses.  They employ the technique of *shadow memory* to store metadata
>> about application data references, using the compiler to instrument loads
>> and stores with code to update the shadow memory.
>
> Is it just counting the number of reads/writes? Or are you going to
> add how many of those accesses were hit by a cache miss?
>
>
>> *Cache fragmentation*: this tool gather data structure field hotness
>> information, looking for data layout optimization opportunities by grouping
>> hot fields together to avoid data cache fragmentation.  Future enhancements
>> may add field affinity information if it can be computed with low enough
>> overhead.
>
> Would be also good to have temporal information, so that you can
> correlate data access that occurs, for example, inside the same loop /
> basic block, or in sequence in the common CFG flow. This could lead to
> change in allocation patterns (heap, BSS).
>
>
>> *Working set measurement*: this tool measures the data working set size of
>> an application at each snapshot during execution.  It can help to understand
>> phased behavior as well as providing basic direction for further effort by
>> the developer: e.g., knowing whether the working set is close to fitting in
>> current L3 caches or is many times larger can help determine where to spend
>> effort.
>
> This is interesting, but most useful when your dataset changes size
> over different runs. This is similar to running the program under perf
> for different workloads, and I'm not sure how you're going to get that
> in a single run. It also comes with the additional problem that cache
> sizes are not always advertised, so you might have an additional tool
> to guess the sizes based on increasing the size of data blocks and
> finding steps on the data access graph.
>
>
>> *Dead store detection*: this tool identifies dead stores (write-after-write
>> patterns with no intervening read) as well as redundant stores (writes of
>> the same value already in memory).  Xref the Deadspy paper from CGO 2012.
>
> This should probably be spotted by the compiler, so I guess it's a
> tool for compiler developers to spot missed optimisation opportunities
> in the back-end.

Not when dead store happens in an external DSO where compiler can't 
detect it (same applies for single references).

>> *Single-reference*: this tool identifies data cache lines brought in but
>> only read once.  These could be candidates for non-temporal loads.
>
> That's nice and should be simple enough to get a report in the end.
> This also seem to be a hint to compiler developers rather than users.
>
> I think you guys have a nice set of tools to develop and I'm looking
> forward to working with them.
>
> cheers,
> --renato
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>