[LLVMdev] [Polly][GSOC2013] FastPolly -- SCOP Detection Pass
Star Tan
tanmx_star at yeah.net
Sun Jun 30 19:59:04 PDT 2013
At 2013-06-30 08:34:34,"Tobias Grosser" <tobias at grosser.es> wrote:
>On 06/29/2013 05:04 PM, Star Tan wrote:
>> Hi all,
>>
>>
>>
>> I have investigated the compile-time overhead of "Polly Scop Detection" pass based on LNT testing results.
>> This mail is to share some results I have found.
>>
>>
>> (1) Analysis of "SCOP Detection Pass" for PolyBench (Attached file PolyBench_SCoPs.log)
>> Experimental results show that the "SCOP Detection pass" does not lead to significant extra compile-time overhead for compiling PolyBench. The percent of compile-time overhead caused by "SCOP Detection Pass" is usually less than 4% of total compile-time. Details for each benchmark can be seen in attached file SCoPs.tgz. I think this is because a lot of other Polly passes, such as "Cloog code generation" and "Induction Variable Simplification" are much more expensive than the "SCOP Detection Pass".
>
>Good.
>
>> (2) Analysis of "SCOP Detection Pass for two hot benchmarks (tramp3d and oggenc)
>> “SCOP Detection Pass" would lead to significant compile-time overhead for these two benchmarks: tramp3d and oggenc, both of which are included in LLVM test-suite/MultiSource.
>>
>>
>> The top 5 passes in compiling tramp3d are: (Attached file tramp3d-SCoPs.log)
>> ---User Time--- --System Time-- --User+System-- ---Wall Time--- --- Name ---
>> 6.0720 ( 21.7%) 0.0400 ( 2.1%) 6.1120 ( 20.5%) 6.1986 ( 20.6%) Polly - Detect static control parts (SCoPs)
>> 4.0600 ( 14.5%) 0.2000 ( 10.7%) 4.2600 ( 14.3%) 4.3655 ( 14.5%) X86 DAG->DAG Instruction Selection
>> 1.9880 ( 7.1%) 0.2080 ( 11.1%) 2.1960 ( 7.4%) 2.2277 ( 7.4%) Function Integration/Inlining
>> 1.7520 ( 6.3%) 0.0840 ( 4.5%) 1.8360 ( 6.2%) 1.8765 ( 6.2%) Global Value Numbering
>> 1.2440 ( 4.4%) 0.1040 ( 5.5%) 1.3480 ( 4.5%) 1.2925 ( 4.3%) Combine redundant instructions
>>
>>
>> and the top 5 passes in compiling oggenc are: (Attached file oggenc-SCoPs.log)
>> ---User Time--- --System Time-- --User+System-- ---Wall Time--- --- Name ---
>> 0.7760 ( 14.6%) 0.0280 ( 11.1%) 0.8040 ( 14.4%) 0.8207 ( 14.5%) X86 DAG->DAG Instruction Selection
>> 0.7080 ( 13.3%) 0.0040 ( 1.6%) 0.7120 ( 12.8%) 0.7317 ( 13.0%) Polly - Detect static control parts (SCoPs)
>> 0.4200 ( 7.9%) 0.0000 ( 0.0%) 0.4200 ( 7.5%) 0.4135 ( 7.3%) Polly - Calculate dependences
>> 0.3120 ( 5.9%) 0.0200 ( 7.9%) 0.3320 ( 6.0%) 0.2947 ( 5.2%) Loop Strength Reduction
>> 0.1720 ( 3.2%) 0.0080 ( 3.2%) 0.1800 ( 3.2%) 0.1992 ( 3.5%) Global Value Numbering
>>
>>
>> Results show that Polly spends a lot of time on detecting scops, but most of region scops are proved to be invalid at last. As a result, this pass waste a lot of compile-time.I think we should improve this pass by detect invalid scop early.
>
>Great. Now we have two test cases we can work with. Can you
>upload the LLVM-IR produced by clang -O0 (without Polly)?
I have attached the LLVM-IR file for tramp3d and oggenc.
>
>The next step is to understand what is going on. Some ideas on how to
>understand what is going on:
>
>1) Reduce the amount of input code
>
>At best, we can reduce this to the single function on which the Polly
>scop detection takes more than 20.6% of the overall time. To get there,
>I propose to run the timings with 'opt -O3' (instead of clang). As a
>first step you disable inlining to make sure that your results are still
>reproducible. If this is the case, I would (semi-automatically?) try to
>reduce the test case by removing functions from it for which the removal
>does not reduce the Polly overhead.
Yes, the original LLVM IR code is too complex.
I would try to reduce the code size by investigating some hot functions in these two testcases.
>
>2) Check why the Polly scop detection is failing
>
>You can use 'opt -polly-detect -analyze' to see the most common reasons
>the scop detection failed. We should verify that we perform the most
>common and cheap tests early.
I would dig into the details of these two testcases.
>
>> (3) About detecting scop regions in bottom-up order.
>> Detecting scop regions in bottom-up order can significantly speed up the scop detection pass. However, as I have discussed with Sebastian, detecting scops in bottom-up order and up-bottom order will lead to different results. As a result, we should not change the detection order.
>
>Sebastian had a patch for this. Does his patch improve the scop
>detection time.
>
>I agree with you that we can not just switch the order in which we
>detect scops, but I still believe that a bottom up detection is the
>right way to go. However, to abort early we need to classify the scop
>detection failures into failures that will equally hold for larger scops
>and ones that may disappear in larger scops. Only if a failure of the
>first class was detected, we can abort early without reducing scop coverage.
>
>> Do you have any other suggestions that may speed up the scop detection pass?
>
>I believe bottom-up detection may be a good thing, but before drawing
>conclusions we need to understand where time is actually spent. The
>suggestions above should help us there.
>
>Cheers
>Tobi
>
>P.S.: Please do not copy llvm-commits. as it is only for patches and
>commit messages.
Thanks for your suggestions again.
Best wishes,
Star Tan
从网易yeah邮箱发来的云附件
oggenc_tramp3d_llvm_ir.tgz (2.11M, 2013年7月16日 10:58 到期)
下载
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130701/2e79a95a/attachment.html>
More information about the llvm-dev
mailing list