[LLVMdev] [Polly] Question about Polly's speed up on huffbench.c without optimization and code generation

Mon Aug 5 22:29:50 PDT 2013

On 08/05/2013 08:08 PM, Star Tan wrote:
> Hi all,
>
>
>   It seems that Polly could still speed up  test-suite/SingleSource/Benchmarks/CoyoteBench/huffbench.c even without any optimization and code generation. Our evaluation show that when compiled with "clang -Xclang -load -Xclang LLVMPolly.so -mllvm -polly -mllvm -polly-optimizer=none -mllvm -polly-code-generator=none", the execution time of huffbench would reduced to 15 secs from the original 19 secs without Polly.
>
>
> By investigating Polly's canonicalication passes, I find the speedup mainly comes from "createIndVarSimplifyPass()", which is controlled by the variable SCEVCodegen:
>
>
>      if (!SCEVCodegen)
>         PM.add(polly::createIndVarSimplifyPass());
>
>
> If we remove this canonicalication pass, then there would be no performance improvement.
>
>
> Could anyone give me some hints why Polly needs this canonicalication pass in normal cases but refuse it in SCEVCodegen case? Is it possible to remove this canonicalication pass at all?

Hi Star,

polly::createIndVarSimplifyPass() is used in Polly to create canonical 
induction variables in case we do not use the SCEV based code 
generation. For SCEV based code generation this pass is not needed any 
more and one motivation for writing the SCEV based code generation was 
in fact to remove the need for this pass. It still exists as we did not 
yet fully test the SCEV based code generation and for the classical code
generation we need canonical induction variables.

Regarding the speed up due to Polly. It seems the rewrites introduced by 
the createIndVarSimplifyPass happen to yield faster code. If you can 
easily reproduce a reduced test case that shows a missing optimization,
it would be great to get a bug report for this. On the other hand, I 
remember the induction variable canonicalization was removed due to 
introducing unpredictable performance regressions (and possible 
improvements?). Hence, I would not spend too much time tracking on this
in case there is no obvious missed optimization.

Cheers,
Tobi