[LLVMdev] [Polly] Compile-time and Execution-time analysis for the SCEV canonicalization

Sun Sep 8 14:52:35 PDT 2013

On 09/08/2013 08:03 PM, Star Tan wrote:
> Hello all,
>
>
> I have done some basic experiments about Polly canonicalization passes and I found the SCEV canonicalization has significant impact on both compile-time and execution-time performance.

Interesting.

> Detailed results for SCEV and default canonicalization can be viewed on: http://188.40.87.11:8000/db_default/v4/nts/32 (or 33, 34)
>     *pNoGen with SCEV canonicalization (run 32): -O3 -Xclang -load -Xclang LLVMPolly.so -mllvm -polly -mllvm -polly-optimizer=none -mllvm -polly-code-generator=none -mllvm -polly-codegen-scev
>     *pNoGen with default canonicalization (run 33): -O3 -Xclang -load -Xclang LLVMPolly.so -mllvm -polly -mllvm -polly-optimizer=none -mllvm -polly-code-generator=none
>     *pBasic without any canonicalization (run 34): -O3 -Xclang -load -Xclang LLVMPolly.so
>
>
> Impact of SCEV canonicalization:
>      http://188.40.87.11:8000/db_default/v4/nts/32?compare_to=34&baseline=34
> Impact of default canonicalization:
>      http://188.40.87.11:8000/db_default/v4/nts/33?compare_to=34&baseline=34
> Comparison of SCEV canonicalization with default canonicalization:
>      http://188.40.87.11:8000/db_default/v4/nts/32?compare_to=33&baseline=33
>
>
> As we expected, both SCEV canonicalization and default canonicalization will slightly increase the compile-time overhead (at most 30% extra compile-time). They also lead to some execution-time regressions and improvements.
>
>
> The only difference between SCEV canonicalization and default canonicalization is the "IndVarSimplify" pass as shown in the code RegisterPasses.cpp:212:
>        if (!SCEVCodegen)
>          PM.add(polly::createIndVarSimplifyPass());

There are actually more differences (see grep -R SCEVCodegen polly/), 
but the other differences will mainly be code generation differences.

> However, I find it is interesting to look into the comparison between SCEV canonicalization and default canonicalization (http://188.40.87.11:8000/db_default/v4/nts/32?compare_to=33&baseline=33):

Yes, this is definitely a good start.

> First of all, we can expect SCEV canonicalization has better compile-time performance since it avoids the "IndVarSimplify" pass. Actually, it can gain more than 5% compile-time performance improvement for 32 benchmarks, especially for the following benchmarks:
>          MultiSource/Applications/lemon/lemon-11.02%
>          SingleSource/Benchmarks/Misc/oourafft-10.53%
>          SingleSource/Benchmarks/Linpack/linpack-pc-10.00%
>          MultiSource/Benchmarks/MiBench/automotive-susan/automotive-susan-8.31%
>          MultiSource/Benchmarks/TSVC/LinearDependence-flt/LinearDependence-flt-8.18%
>
>
> Second, we find that SCEV canonicalization has both regression and improvement of execution performance compared with default canonicalization. Actually, there are many execution-time regressions such as:
>          SingleSource/Benchmarks/Shootout/nestedloop+16363.64%
>          SingleSource/Benchmarks/Shootout-C++/nestedloop+16200.00%

Those two have a huge impact. Understanding what is going on here would 
be nice.

> I think the execution-time performance regression is mainly because of the unexpected performance improvements from non-SCEV canonicalization as shown int eh following bug: http://llvm.org/bugs/show_bug.cgi?id=17153. I will try to find out why "IndVarSimplify" can produce better code in the next step. If we can eliminate "IndVarSimplify" canonicalization but keep on producing high-quality code, then we can gain better compile-time performance without execution-time performance loss.

Previous experience has shown that the indvars pass as we run it in 
Polly can unpredictably change performance both negatively and 
positively. It was disabled as it people did not manage to eliminate all 
regressions it introduced, such that the positive performance changes 
could not really be valued.

So regarding performance tuning, I do not think we need to get this 
optimal. As soon as -polly-codegen-scev reaches similar performance than
the original approach, we are fine.

Also, I wonder if your runs include the dependence analysis. If this is 
the case, the numbers are very good. Otherwise, 30% overhead seems still 
to be a little bit much.

Tobi