[LLVMdev] [Polly] Compile-time and Execution-time analysis for the SCEV canonicalization

Sun Sep 8 22:07:07 PDT 2013

On 09/09/2013 05:18 AM, Star Tan wrote:
>
> At 2013-09-09 05:52:35,"Tobias Grosser" <tobias at grosser.es> wrote:
>
>> On 09/08/2013 08:03 PM, Star Tan wrote:
>> Also, I wonder if your runs include the dependence analysis. If this is
>> the case, the numbers are very good. Otherwise, 30% overhead seems still
>> to be a little bit much.
> I think no Polly Dependence analysis is involved since our compiling command is:
> clang -O3 -Xclang -load -Xclang LLVMPolly.so -mllvm -polly -mllvm -polly-optimizer=none -mllvm -polly-code-generator=none  -mllvm -polly-codegen-scev
> Fortunately, with the option "-polly-codegen-scev", only three benchmark shows >20% extra compile-time overhead:

I believe so to, but please verify with -debug-pass=Structure

> SingleSource/Benchmarks/Misc/flops	28.57%
> MultiSource/Benchmarks/MiBench/security-sha/security-sha	22.22%
> MultiSource/Benchmarks/VersaBench/ecbdes/ecbdes	21.05%
> When I look into the compile-time for the flop benchmark using "-ftime-report", I find the extra compile-time overhead mainly comes from the "Combine redundant instructions" pass.
> the top 5 passes when compiled with Polly canonicalization passes:
>     ---User Time---   --User+System--   ---Wall Time---  --- Name ---
>     0.0160 ( 20.0%)   0.0160 ( 20.0%)   0.0164 ( 20.8%)  Combine redundant instructions
>     0.0120 ( 15.0%)   0.0120 ( 15.0%)   0.0138 ( 17.5%)  X86 DAG->DAG Instruction Selection
>     0.0040 (  5.0%)   0.0040 (  5.0%)   0.0045 (  5.7%)  Greedy Register Allocator
>     0.0000 (  0.0%)   0.0000 (  0.0%)   0.0029 (  3.7%)  Global Value Numbering
>     0.0040 (  5.0%)   0.0040 (  5.0%)   0.0028 (  3.6%)  Polly - Create polyhedral description of Scops
>
> But the top 5 passes for clang is:
>     ---User Time---   --System Time--   --User+System--   ---Wall Time---  --- Name ---
>     0.0120 ( 25.0%)   0.0000 (  0.0%)   0.0120 ( 21.4%)   0.0141 ( 25.2%)  X86 DAG->DAG Instruction Selection
>     0.0040 (  8.3%)   0.0000 (  0.0%)   0.0040 (  7.1%)   0.0047 (  8.4%)  Greedy Register Allocator
>     0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0034 (  6.1%)  Combine redundant instructions
>     0.0000 (  0.0%)   0.0040 ( 50.0%)   0.0040 (  7.1%)   0.0029 (  5.2%)  Global Value Numbering
>     0.0040 (  8.3%)   0.0000 (  0.0%)   0.0040 (  7.1%)   0.0029 (  5.2%)  Combine redundant instructions
> We can see the "Combine redundant instructions" are invoked many times and the extra invoke resulted by Polly's canonicalization is the most significant one. We have found this problem before and I need to look into the details of canonicalization passes related to "Combine redundant instructions".

OK.

> BTW, I want to point out that although SCEV based Polly canonicalization (with -polly-codegen-scev) runs faster than default canonicalization, it can lead to 5 extra compile errors and 3 extra runtime errors as shown on http://188.40.87.11:8000/db_default/v4/nts/32?compare_to=34&baseline=34.
> I have done some basic analysis for one of the compile error (7zip-benchmark). Results can be viewed on http://llvm.org/bugs/show_bug.cgi?Cid=17159

Great. I will help looking into this starting this WE.

Cheers,
Tobias