[LLVMdev] [Polly] Compile-time and Execution-time analysis for the SCEV canonicalization
Tobias Grosser
tobias at grosser.es
Sun Sep 8 22:07:07 PDT 2013
On 09/09/2013 05:18 AM, Star Tan wrote:
>
> At 2013-09-09 05:52:35,"Tobias Grosser" <tobias at grosser.es> wrote:
>
>> On 09/08/2013 08:03 PM, Star Tan wrote:
>> Also, I wonder if your runs include the dependence analysis. If this is
>> the case, the numbers are very good. Otherwise, 30% overhead seems still
>> to be a little bit much.
> I think no Polly Dependence analysis is involved since our compiling command is:
> clang -O3 -Xclang -load -Xclang LLVMPolly.so -mllvm -polly -mllvm -polly-optimizer=none -mllvm -polly-code-generator=none -mllvm -polly-codegen-scev
> Fortunately, with the option "-polly-codegen-scev", only three benchmark shows >20% extra compile-time overhead:
I believe so to, but please verify with -debug-pass=Structure
> SingleSource/Benchmarks/Misc/flops 28.57%
> MultiSource/Benchmarks/MiBench/security-sha/security-sha 22.22%
> MultiSource/Benchmarks/VersaBench/ecbdes/ecbdes 21.05%
> When I look into the compile-time for the flop benchmark using "-ftime-report", I find the extra compile-time overhead mainly comes from the "Combine redundant instructions" pass.
> the top 5 passes when compiled with Polly canonicalization passes:
> ---User Time--- --User+System-- ---Wall Time--- --- Name ---
> 0.0160 ( 20.0%) 0.0160 ( 20.0%) 0.0164 ( 20.8%) Combine redundant instructions
> 0.0120 ( 15.0%) 0.0120 ( 15.0%) 0.0138 ( 17.5%) X86 DAG->DAG Instruction Selection
> 0.0040 ( 5.0%) 0.0040 ( 5.0%) 0.0045 ( 5.7%) Greedy Register Allocator
> 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0029 ( 3.7%) Global Value Numbering
> 0.0040 ( 5.0%) 0.0040 ( 5.0%) 0.0028 ( 3.6%) Polly - Create polyhedral description of Scops
>
> But the top 5 passes for clang is:
> ---User Time--- --System Time-- --User+System-- ---Wall Time--- --- Name ---
> 0.0120 ( 25.0%) 0.0000 ( 0.0%) 0.0120 ( 21.4%) 0.0141 ( 25.2%) X86 DAG->DAG Instruction Selection
> 0.0040 ( 8.3%) 0.0000 ( 0.0%) 0.0040 ( 7.1%) 0.0047 ( 8.4%) Greedy Register Allocator
> 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0034 ( 6.1%) Combine redundant instructions
> 0.0000 ( 0.0%) 0.0040 ( 50.0%) 0.0040 ( 7.1%) 0.0029 ( 5.2%) Global Value Numbering
> 0.0040 ( 8.3%) 0.0000 ( 0.0%) 0.0040 ( 7.1%) 0.0029 ( 5.2%) Combine redundant instructions
> We can see the "Combine redundant instructions" are invoked many times and the extra invoke resulted by Polly's canonicalization is the most significant one. We have found this problem before and I need to look into the details of canonicalization passes related to "Combine redundant instructions".
OK.
> BTW, I want to point out that although SCEV based Polly canonicalization (with -polly-codegen-scev) runs faster than default canonicalization, it can lead to 5 extra compile errors and 3 extra runtime errors as shown on http://188.40.87.11:8000/db_default/v4/nts/32?compare_to=34&baseline=34.
> I have done some basic analysis for one of the compile error (7zip-benchmark). Results can be viewed on http://llvm.org/bugs/show_bug.cgi?Cid=17159
Great. I will help looking into this starting this WE.
Cheers,
Tobias
More information about the llvm-dev
mailing list