[llvm-dev] RFC: EfficiencySanitizer
Yury Gribov via llvm-dev
llvm-dev at lists.llvm.org
Wed Apr 20 05:18:46 PDT 2016
On 04/20/2016 02:58 PM, Renato Golin via llvm-dev wrote:
> Hi Derek,
> I'm not an expert in any of these topics, but I'm excited that you
> guys are doing it. It seems like a missing piece that needs to be
> Some comments inline...
> On 17 April 2016 at 22:46, Derek Bruening via llvm-dev
> <llvm-dev at lists.llvm.org> wrote:
>> We would prefer to trade off accuracy and build a
>> less-accurate tool below our overhead ceiling than to build a high-accuracy
>> but slow tool.
> I agree with this strategy.
> As a first approach, making the fastest you can, then later
> introducing more probes, maybe via some slider flag (like -ON) to
> consciously trade speed for accuracy.
>> Studying instruction cache behavior with compiler
>> instrumentation can be challenging, however, so we plan to at least
>> initially focus on data performance.
> I'm interested in how you're going to do this without kernel profiling
> probes, like perf.
> Or is the point here introducing syscalls in the right places instead
> of randomly profiled? Wouldn't that bias your results?
>> Many of our planned tools target specific performance issues with data
>> accesses. They employ the technique of *shadow memory* to store metadata
>> about application data references, using the compiler to instrument loads
>> and stores with code to update the shadow memory.
> Is it just counting the number of reads/writes? Or are you going to
> add how many of those accesses were hit by a cache miss?
>> *Cache fragmentation*: this tool gather data structure field hotness
>> information, looking for data layout optimization opportunities by grouping
>> hot fields together to avoid data cache fragmentation. Future enhancements
>> may add field affinity information if it can be computed with low enough
> Would be also good to have temporal information, so that you can
> correlate data access that occurs, for example, inside the same loop /
> basic block, or in sequence in the common CFG flow. This could lead to
> change in allocation patterns (heap, BSS).
>> *Working set measurement*: this tool measures the data working set size of
>> an application at each snapshot during execution. It can help to understand
>> phased behavior as well as providing basic direction for further effort by
>> the developer: e.g., knowing whether the working set is close to fitting in
>> current L3 caches or is many times larger can help determine where to spend
> This is interesting, but most useful when your dataset changes size
> over different runs. This is similar to running the program under perf
> for different workloads, and I'm not sure how you're going to get that
> in a single run. It also comes with the additional problem that cache
> sizes are not always advertised, so you might have an additional tool
> to guess the sizes based on increasing the size of data blocks and
> finding steps on the data access graph.
>> *Dead store detection*: this tool identifies dead stores (write-after-write
>> patterns with no intervening read) as well as redundant stores (writes of
>> the same value already in memory). Xref the Deadspy paper from CGO 2012.
> This should probably be spotted by the compiler, so I guess it's a
> tool for compiler developers to spot missed optimisation opportunities
> in the back-end.
Not when dead store happens in an external DSO where compiler can't
detect it (same applies for single references).
>> *Single-reference*: this tool identifies data cache lines brought in but
>> only read once. These could be candidates for non-temporal loads.
> That's nice and should be simple enough to get a report in the end.
> This also seem to be a hint to compiler developers rather than users.
> I think you guys have a nice set of tools to develop and I'm looking
> forward to working with them.
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
More information about the llvm-dev