[PATCH] Break dependencies in large loops containing reductions (LoopVectorize)

Mon Feb 23 10:58:03 PST 2015

In http://reviews.llvm.org/D7514#128320, @ohsallen wrote:

> I benchmarked this patch (without the multiply-add nonsense) on POWER8 and got the following speedups :
>
> MultiSource/Benchmarks/McCat/08-main/main
>
>   -59.4169% +/- 40.2649%
>
> MultiSource/Benchmarks/Prolangs-C/fixoutput/fixoutput
>
>   -83.6948% +/- 82.1021%
>
> SingleSource/UnitTests/2003-05-02-DependentPHI
>
>   -33.8917% +/- 31.3964%
>
> SingleSource/UnitTests/2003-07-06-IntOverflow
>
>   -43.784% +/- 38.5492%
>   
>
> And the following slowdowns:
>
> MultiSource/Applications/kimwitu++/kc
>
>   80.6835% +/- 69.7813%
>
> MultiSource/Applications/viterbi/viterbi
>
>   35.2072% +/- 24.8361%
>
> MultiSource/Benchmarks/7zip/7zip-benchmark
>
>   9.8082% +/- 6.31851%
>
> MultiSource/Benchmarks/nbench/nbench,pass,
>
>   8.43677% +/- 7.81566%
>   
>
> Will have to investigate whether the slowdowns are related to spills... However it seems that, if we were able to fine tune this, it would be profitable.

Neat; I see you're using my performance benchmarking script. :) -- I forgot to warn you, however (my fault), that I don't trust any of the numbers which have standard deviations more than about half to two-thirds of the delta. It looks like you need more samples (or quieter runs -- likely achievable by running fewer in parallel -- but more samples is likely easier). So, of these, I'd ignore MultiSource/Benchmarks/Prolangs-C/fixoutput/fixoutput, SingleSource/UnitTests/2003-05-02-DependentPHI, SingleSource/UnitTests/2003-07-06-IntOverflow, and probably all of the slowdowns (but they're significant enough to justify taking more samples -- sorry, this is not completely scientific).

You can also go into the directory containing the particular tests in question and run the 'make' commands there (that is certainly faster than re-running everything). And, lastly, you might also want to restrict your testing to those tests known to yield reasonable results for benchmarking (do this by defining BENCHMARKING_ONLY=1 on each 'make' invocation).

As a general note, I think the idea proposed is likely to yield a profitable heuristic; it is just a matter of making sure the implementation makes sense.

http://reviews.llvm.org/D7514

EMAIL PREFERENCES
  http://reviews.llvm.org/settings/panel/emailpreferences/