[LLVMdev] [Polly] Update of Polly compile-time performance on LLVM test-suite

Wed Jul 31 07:50:57 PDT 2013

On 07/30/2013 10:03 AM, Star Tan wrote:
> Hi Tobias and all Polly developers,
>
> I have re-evaluated the Polly compile-time performance using newest
> LLVM/Polly source code.  You can view the results on
> http://188.40.87.11:8000
> <http://188.40.87.11:8000/db_default/v4/nts/16?compare_to=9&baseline=9&aggregation_fn=median>.
>
> Especially, I also evaluated ourr187102 patch file that avoids expensive
> failure string operations in normal execution. Specifically, I evaluated
> two cases for it:
>
> Polly-NoCodeGen: clang -O3 -load LLVMPolly.so -mllvm
> -polly-optimizer=none -mllvm -polly-code-generator=none
> http://188.40.87.11:8000/db_default/v4/nts/16?compare_to=9&baseline=9&aggregation_fn=median
> Polly-Opt: clang -O3 -load LLVMPolly.so -mllvm -polly
> http://188.40.87.11:8000/db_default/v4/nts/18?compare_to=11&baseline=11&aggregation_fn=median
>
> The "Polly-NoCodeGen" case is mainly used to compare the compile-time
> performance for the polly-detect pass. As shown in the results, our
> patch file could significantly reduce the compile-time overhead for some
> benchmarks such as tramp3dv4
> <http://188.40.87.11:8000/db_default/v4/nts/16/graph?test.355=2> (24.2%), simple_types_constant_folding
> <http://188.40.87.11:8000/db_default/v4/nts/16/graph?test.366=2>(12.6%),
> oggenc
> <http://188.40.87.11:8000/db_default/v4/nts/16/graph?test.331=2>(9.1%),
> loop_unroll
> <http://188.40.87.11:8000/db_default/v4/nts/16/graph?test.235=2>(7.8%)

Very nice!

Though I am surprised to also see performance regressions. They are all 
in very shortly executing kernels, so they may very well be measuring 
noice. Is this really the case?

Also, it may be interesting to compare against the non-polly case to see
how much overhead there is still due to our scop detetion.

> The "Polly-opt" case is used to compare the whole compile-time
> performance of Polly. Since our patch file mainly affects the
> Polly-Detect pass, it shows similar performance to "Polly-NoCodeGen". As
> shown in results, it reduces the compile-time overhead of some
> benchmarks such as tramp3dv4
> <http://188.40.87.11:8000/db_default/v4/nts/16/graph?test.355=2> (23.7%), simple_types_constant_folding
> <http://188.40.87.11:8000/db_default/v4/nts/16/graph?test.366=2>(12.9%),
> oggenc
> <http://188.40.87.11:8000/db_default/v4/nts/16/graph?test.331=2>(8.3%),
> loop_unroll
> <http://188.40.87.11:8000/db_default/v4/nts/16/graph?test.235=2>(7.5%)
>
> At last, I also evaluated the performance of the ScopBottomUp patch that
> changes the up-down scop detection into bottom-up scop detection.
> Results can be viewed by:
> pNoCodeGen-ScopBottomUp: clang -O3 -load LLVMPolly.so (v.s.
> LLVMPolly-ScopBottomUp.so)  -mllvm -polly-optimizer=none -mllvm
> -polly-code-generator=none
> http://188.40.87.11:8000/db_default/v4/nts/21?compare_to=16&baseline=16&aggregation_fn=median
> pOpt-ScopBottomUp: clang -O3 -load LLVMPolly.so (v.s.
> LLVMPolly-ScopBottomUp.so)  -mllvm -polly
> http://188.40.87.11:8000/db_default/v4/nts/19?compare_to=18&baseline=18&aggregation_fn=median
> (*Both of these results are based on LLVM r187116, which has included
> the r187102 patch file that we discussed above)
>
> Please notice that this patch file will lead to some errors in
> Polly-tests, so the data shown here can not be regards as confident
> results. For example, this patch can significantly reduce the
> compile-time overhead of SingleSource/Benchmarks/Shootout/nestedloop
> <http://188.40.87.11:8000/db_default/v4/nts/19/graph?test.17=2> only
> because it regards the nested loop as an invalid scop and skips all
> following transformations and optimizations. However, I evaluated it
> here to see its potential performance impact.  Based on the results
> shown on
> http://188.40.87.11:8000/db_default/v4/nts/21?compare_to=16&baseline=16&aggregation_fn=median,
> we can see detecting scops bottom-up may further reduce Polly
> compile-time by more than 10%.

Interesting. For some reason it also regresses huffbench quite a bit. 
:-( I think here an up-to-date non-polly to polly comparision would come 
handy to see which benchmarks we still see larger performance 
regressions. And if the bottom-up scop detection actually helps here.
As this is a larger patch, we should really have a need for it before 
switching to it.

Cheers,
Tobias