<div style="line-height:1.7;color:#000000;font-size:14px;font-family:arial"><br><span style="white-space: pre-wrap; font-size: 14px; line-height: 1.7;">At 2013-09-09 05:52:35,"Tobias Grosser" <tobias@grosser.es> wrote:</span><br><pre>>On 09/08/2013 08:03 PM, Star Tan wrote:

>> Hello all,

>>

>>

>> I have done some basic experiments about Polly canonicalization passes and I found the SCEV canonicalization has significant impact on both compile-time and execution-time performance.

>

>Interesting.

>

>> Detailed results for SCEV and default canonicalization can be viewed on: http://188.40.87.11:8000/db_default/v4/nts/32 (or 33, 34)

>>     *pNoGen with SCEV canonicalization (run 32): -O3 -Xclang -load -Xclang LLVMPolly.so -mllvm -polly -mllvm -polly-optimizer=none -mllvm -polly-code-generator=none -mllvm -polly-codegen-scev

>>     *pNoGen with default canonicalization (run 33): -O3 -Xclang -load -Xclang LLVMPolly.so -mllvm -polly -mllvm -polly-optimizer=none -mllvm -polly-code-generator=none

>>     *pBasic without any canonicalization (run 34): -O3 -Xclang -load -Xclang LLVMPolly.so

>>

>>

>> Impact of SCEV canonicalization:

>>      http://188.40.87.11:8000/db_default/v4/nts/32?compare_to=34&baseline=34

>> Impact of default canonicalization:

>>      http://188.40.87.11:8000/db_default/v4/nts/33?compare_to=34&baseline=34

>> Comparison of SCEV canonicalization with default canonicalization:

>>      http://188.40.87.11:8000/db_default/v4/nts/32?compare_to=33&baseline=33

>>

>>

>> As we expected, both SCEV canonicalization and default canonicalization will slightly increase the compile-time overhead (at most 30% extra compile-time). They also lead to some execution-time regressions and improvements.

>>

>>

>> The only difference between SCEV canonicalization and default canonicalization is the "IndVarSimplify" pass as shown in the code RegisterPasses.cpp:212:

>>        if (!SCEVCodegen)

>>          PM.add(polly::createIndVarSimplifyPass());

>

>There are actually more differences (see grep -R SCEVCodegen polly/), 

>but the other differences will mainly be code generation differences.</pre><pre>Thanks for your reminder. Since we are currently focusing on canonicalization passes, the other differences for code generation do not matter.

>> However, I find it is interesting to look into the comparison between SCEV canonicalization and default canonicalization (http://188.40.87.11:8000/db_default/v4/nts/32?compare_to=33&baseline=33):

>

>Yes, this is definitely a good start.

>

>> First of all, we can expect SCEV canonicalization has better compile-time performance since it avoids the "IndVarSimplify" pass. Actually, it can gain more than 5% compile-time performance improvement for 32 benchmarks, especially for the following benchmarks:

>>          MultiSource/Applications/lemon/lemon-11.02%

>>          SingleSource/Benchmarks/Misc/oourafft-10.53%

>>          SingleSource/Benchmarks/Linpack/linpack-pc-10.00%

>>          MultiSource/Benchmarks/MiBench/automotive-susan/automotive-susan-8.31%

>>          MultiSource/Benchmarks/TSVC/LinearDependence-flt/LinearDependence-flt-8.18%

>>

>>

>> Second, we find that SCEV canonicalization has both regression and improvement of execution performance compared with default canonicalization. Actually, there are many execution-time regressions such as:

>>          SingleSource/Benchmarks/Shootout/nestedloop+16363.64%

>>          SingleSource/Benchmarks/Shootout-C++/nestedloop+16200.00%</pre><pre>>Those two have a huge impact. Understanding what is going on here would 

>be nice.</pre><pre>Yes, I am investigating these cases.</pre><pre>>> I think the execution-time performance regression is mainly because of the unexpected performance improvements from non-SCEV canonicalization as shown int eh following bug: http://llvm.org/bugs/show_bug.cgi?id=17153. I will try to find out why "IndVarSimplify" can produce better code in the next step. If we can eliminate "IndVarSimplify" canonicalization but keep on producing high-quality code, then we can gain better compile-time performance without execution-time performance loss.

>

>Previous experience has shown that the indvars pass as we run it in 

>Polly can unpredictably change performance both negatively and 

>positively. It was disabled as it people did not manage to eliminate all 

>regressions it introduced, such that the positive performance changes 

>could not really be valued.

>

>So regarding performance tuning, I do not think we need to get this 

>optimal. As soon as -polly-codegen-scev reaches similar performance than

>the original approach, we are fine.</pre><pre>I see. I agree with you. I think we care more about compile-time performance for Polly's canonicalization passes since no Polly optimization or Polly code generation happens here.

>Also, I wonder if your runs include the dependence analysis. If this is 

>the case, the numbers are very good. Otherwise, 30% overhead seems still 

>to be a little bit much.</pre><pre>I think no Polly Dependence analysis is involved since our compiling command is:  </pre><pre>clang<span style="font-size: 14px; line-height: 1.7;"> -O3 -Xclang -load -Xclang LLVMPolly.so -mllvm -polly -mllvm -polly-optimizer=none -mllvm -polly-code-generator=none </span><span style="font-size: 14px; line-height: 1.7;"> -mllvm -polly-codegen-scev</span></pre><pre>Fortunately, with the option "<span style="font-size: 14px; line-height: 1.7;">-polly-codegen-scev", only three benchmark shows >20% extra compile-time overhead:</span></pre><pre>SingleSource/Benchmarks/Misc/flops 28.57%

MultiSource/Benchmarks/MiBench/security-sha/security-sha        22.22%

MultiSource/Benchmarks/VersaBench/ecbdes/ecbdes 21.05%</pre><pre>When I look into the compile-time for the flop benchmark using "-ftime-report", I find the extra compile-time overhead mainly comes from the "<span style="font-size: 14px; line-height: 1.7;">Combine redundant instructions" pass.</span></pre><pre>the top 5 passes when compiled with Polly canonicalization passes:</pre><pre>   ---User Time---   --User+System--   ---Wall Time---  --- Name ---

   0.0160 ( 20.0%)   0.0160 ( 20.0%)   0.0164 ( 20.8%)  Combine redundant instructions

   0.0120 ( 15.0%)   0.0120 ( 15.0%)   0.0138 ( 17.5%)  X86 DAG->DAG Instruction Selection

   0.0040 (  5.0%)   0.0040 (  5.0%)   0.0045 (  5.7%)  Greedy Register Allocator

   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0029 (  3.7%)  Global Value Numbering

   0.0040 (  5.0%)   0.0040 (  5.0%)   0.0028 (  3.6%)  Polly - Create polyhedral description of Scops

</pre><pre>But the top 5 passes for clang is:</pre><pre>   ---User Time---   --System Time--   --User+System--   ---Wall Time---  --- Name ---

   0.0120 ( 25.0%)   0.0000 (  0.0%)   0.0120 ( 21.4%)   0.0141 ( 25.2%)  X86 DAG->DAG Instruction Selection

   0.0040 (  8.3%)   0.0000 (  0.0%)   0.0040 (  7.1%)   0.0047 (  8.4%)  Greedy Register Allocator

   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0034 (  6.1%)  Combine redundant instructions

   0.0000 (  0.0%)   0.0040 ( 50.0%)   0.0040 (  7.1%)   0.0029 (  5.2%)  Global Value Numbering

   0.0040 (  8.3%)   0.0000 (  0.0%)   0.0040 (  7.1%)   0.0029 (  5.2%)  Combine redundant instructions</pre><pre>We can see the "<span style="font-size: 14px; line-height: 1.7;">Combine redundant instructions" are invoked many times and the extra invoke resulted by Polly's canonicalization is the most significant one. We have found this problem before and I need to look into the details of canonicalization passes related to "</span><span style="font-size: 14px; line-height: 1.7;">Combine redundant instructions".</span></pre><pre>BTW, I want to point out that although SCEV based Polly canonicalization (with -polly-codegen-scev) runs faster than default canonicalization, it can lead to 5 extra compile errors and 3 extra runtime errors as shown on <a href="http://188.40.87.11:8000/db_default/v4/nts/32?compare_to=34&baseline=34" style="font-size: 14px; line-height: 1.7;">http://188.40.87.11:8000/db_default/v4/nts/32?compare_to=34&baseline=34</a>.</pre><pre>I have done !

 some basic analysis for one of the compile error (7zip-benchmark). <span style="font-size: 14px; line-height: 1.7;">Results can be viewed on http://llvm.org/bugs/show_bug.cgi?Cid=17159</span></pre><pre>Best,</pre><pre>Star Tan</pre><pre><span style="font-size: 14px; line-height: 1.7;"><br></span></pre><pre><br></pre></div>