[LLVMdev] [Polly][GSOC2013] FastPolly -- SCOP Detection Pass

Sun Jun 30 19:59:04 PDT 2013

At 2013-06-30 08:34:34,"Tobias Grosser" <tobias at grosser.es> wrote:

>On 06/29/2013 05:04 PM, Star Tan wrote:
>> Hi all,
>>
>>
>>
>> I have investigated the compile-time overhead of "Polly Scop Detection" pass based on LNT testing results.
>> This mail is to share some results I have found.
>>
>>
>> (1) Analysis of "SCOP Detection Pass" for PolyBench (Attached file PolyBench_SCoPs.log)
>> Experimental results show that the "SCOP Detection pass" does not lead to significant extra compile-time overhead for compiling PolyBench. The percent of compile-time overhead caused by "SCOP Detection Pass" is usually less than 4% of total compile-time. Details for each benchmark can be seen in attached file SCoPs.tgz. I think this is because a lot of other Polly passes, such as "Cloog code generation" and "Induction Variable Simplification" are much more expensive than the "SCOP Detection Pass".
>
>Good.
>
>> (2) Analysis of "SCOP Detection Pass for two hot benchmarks (tramp3d and oggenc)
>> “SCOP Detection Pass" would lead to significant compile-time overhead for these two benchmarks: tramp3d and oggenc, both of which are included in LLVM test-suite/MultiSource.
>>
>>
>> The top 5 passes in compiling tramp3d are: (Attached file tramp3d-SCoPs.log)
>>     ---User Time---   --System Time--   --User+System--   ---Wall Time---  --- Name ---
>>     6.0720 ( 21.7%)   0.0400 (  2.1%)   6.1120 ( 20.5%)   6.1986 ( 20.6%)  Polly - Detect static control parts (SCoPs)
>>     4.0600 ( 14.5%)   0.2000 ( 10.7%)   4.2600 ( 14.3%)   4.3655 ( 14.5%)  X86 DAG->DAG Instruction Selection
>>     1.9880 (  7.1%)   0.2080 ( 11.1%)   2.1960 (  7.4%)   2.2277 (  7.4%)  Function Integration/Inlining
>>     1.7520 (  6.3%)   0.0840 (  4.5%)   1.8360 (  6.2%)   1.8765 (  6.2%)  Global Value Numbering
>>     1.2440 (  4.4%)   0.1040 (  5.5%)   1.3480 (  4.5%)   1.2925 (  4.3%)  Combine redundant instructions
>>
>>
>> and the top 5 passes in compiling oggenc are: (Attached file oggenc-SCoPs.log)
>>     ---User Time---   --System Time--   --User+System--   ---Wall Time---  --- Name ---
>>     0.7760 ( 14.6%)   0.0280 ( 11.1%)   0.8040 ( 14.4%)   0.8207 ( 14.5%)  X86 DAG->DAG Instruction Selection
>>     0.7080 ( 13.3%)   0.0040 (  1.6%)   0.7120 ( 12.8%)   0.7317 ( 13.0%)  Polly - Detect static control parts (SCoPs)
>>     0.4200 (  7.9%)   0.0000 (  0.0%)   0.4200 (  7.5%)   0.4135 (  7.3%)  Polly - Calculate dependences
>>     0.3120 (  5.9%)   0.0200 (  7.9%)   0.3320 (  6.0%)   0.2947 (  5.2%)  Loop Strength Reduction
>>     0.1720 (  3.2%)   0.0080 (  3.2%)   0.1800 (  3.2%)   0.1992 (  3.5%)  Global Value Numbering
>>
>>
>> Results show that Polly spends a lot of time on detecting scops, but most of region scops are proved to be invalid at last. As a result, this pass waste a lot of compile-time.I think we should improve this pass by detect invalid scop early.
>
>Great. Now we have two test cases we can work with. Can you
>upload the LLVM-IR produced by clang -O0 (without Polly)?
I have attached the LLVM-IR file for tramp3d and oggenc.
>
>The next step is to understand what is going on. Some ideas on how to
>understand what is going on:
>
>1) Reduce the amount of input code
>
>At best, we can reduce this to the single function on which the Polly 
>scop detection takes more than 20.6% of the overall time. To get there,
>I propose to run the timings with 'opt -O3' (instead of clang). As a 
>first step you disable inlining to make sure that your results are still 
>reproducible. If this is the case, I would (semi-automatically?) try to 
>reduce the test case by removing functions from it for which the removal 
>does not reduce the Polly overhead.
Yes, the original LLVM IR code is too complex.
I would try to reduce the code size by investigating some hot functions in these two testcases.
>
>2) Check why the Polly scop detection is failing
>
>You can use 'opt -polly-detect -analyze' to see the most common reasons 
>the scop detection failed. We should verify that we perform the most 
>common and cheap tests early.
I would dig into the details of these two testcases.
>
>> (3) About detecting scop regions in bottom-up order.
>> Detecting scop regions in bottom-up order can significantly speed up the scop detection pass. However, as I have discussed with Sebastian, detecting scops in bottom-up order and up-bottom order will lead to different results. As a result, we should not change the detection order.
>
>Sebastian had a patch for this. Does his patch improve the scop 
>detection time.
>
>I agree with you that we can not just switch the order in which we 
>detect scops, but I still believe that a bottom up detection is the 
>right way to go. However, to abort early we need to classify the scop
>detection failures into failures that will equally hold for larger scops
>and ones that may disappear in larger scops. Only if a failure of the 
>first class was detected, we can abort early without reducing scop coverage.
>
>> Do you have any other suggestions that may speed up the scop detection pass?
>
>I believe bottom-up detection may be a good thing, but before drawing 
>conclusions we need to understand where time is actually spent. The 
>suggestions above should help us there.
>
>Cheers
>Tobi
>
>P.S.: Please do not copy llvm-commits. as it is only for patches and 
>commit messages.

Thanks for your suggestions again.
Best wishes,
Star Tan
从网易yeah邮箱发来的云附件
oggenc_tramp3d_llvm_ir.tgz (2.11M, 2013年7月16日 10:58 到期)
下载
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130701/2e79a95a/attachment.html>