[LLVMdev] [Polly][GSOC2013] FastPolly -- SCOP Detection Pass

Sat Jun 29 17:34:34 PDT 2013

On 06/29/2013 05:04 PM, Star Tan wrote:
> Hi all,
>
>
>
> I have investigated the compile-time overhead of "Polly Scop Detection" pass based on LNT testing results.
> This mail is to share some results I have found.
>
>
> (1) Analysis of "SCOP Detection Pass" for PolyBench (Attached file PolyBench_SCoPs.log)
> Experimental results show that the "SCOP Detection pass" does not lead to significant extra compile-time overhead for compiling PolyBench. The percent of compile-time overhead caused by "SCOP Detection Pass" is usually less than 4% of total compile-time. Details for each benchmark can be seen in attached file SCoPs.tgz. I think this is because a lot of other Polly passes, such as "Cloog code generation" and "Induction Variable Simplification" are much more expensive than the "SCOP Detection Pass".

Good.

> (2) Analysis of "SCOP Detection Pass for two hot benchmarks (tramp3d and oggenc)
> “SCOP Detection Pass" would lead to significant compile-time overhead for these two benchmarks: tramp3d and oggenc, both of which are included in LLVM test-suite/MultiSource.
>
>
> The top 5 passes in compiling tramp3d are: (Attached file tramp3d-SCoPs.log)
>     ---User Time---   --System Time--   --User+System--   ---Wall Time---  --- Name ---
>     6.0720 ( 21.7%)   0.0400 (  2.1%)   6.1120 ( 20.5%)   6.1986 ( 20.6%)  Polly - Detect static control parts (SCoPs)
>     4.0600 ( 14.5%)   0.2000 ( 10.7%)   4.2600 ( 14.3%)   4.3655 ( 14.5%)  X86 DAG->DAG Instruction Selection
>     1.9880 (  7.1%)   0.2080 ( 11.1%)   2.1960 (  7.4%)   2.2277 (  7.4%)  Function Integration/Inlining
>     1.7520 (  6.3%)   0.0840 (  4.5%)   1.8360 (  6.2%)   1.8765 (  6.2%)  Global Value Numbering
>     1.2440 (  4.4%)   0.1040 (  5.5%)   1.3480 (  4.5%)   1.2925 (  4.3%)  Combine redundant instructions
>
>
> and the top 5 passes in compiling oggenc are: (Attached file oggenc-SCoPs.log)
>     ---User Time---   --System Time--   --User+System--   ---Wall Time---  --- Name ---
>     0.7760 ( 14.6%)   0.0280 ( 11.1%)   0.8040 ( 14.4%)   0.8207 ( 14.5%)  X86 DAG->DAG Instruction Selection
>     0.7080 ( 13.3%)   0.0040 (  1.6%)   0.7120 ( 12.8%)   0.7317 ( 13.0%)  Polly - Detect static control parts (SCoPs)
>     0.4200 (  7.9%)   0.0000 (  0.0%)   0.4200 (  7.5%)   0.4135 (  7.3%)  Polly - Calculate dependences
>     0.3120 (  5.9%)   0.0200 (  7.9%)   0.3320 (  6.0%)   0.2947 (  5.2%)  Loop Strength Reduction
>     0.1720 (  3.2%)   0.0080 (  3.2%)   0.1800 (  3.2%)   0.1992 (  3.5%)  Global Value Numbering
>
>
> Results show that Polly spends a lot of time on detecting scops, but most of region scops are proved to be invalid at last. As a result, this pass waste a lot of compile-time.I think we should improve this pass by detect invalid scop early.

Great. Now we have two test cases we can work with. Can you
upload the LLVM-IR produced by clang -O0 (without Polly)?

The next step is to understand what is going on. Some ideas on how to
understand what is going on:

1) Reduce the amount of input code

At best, we can reduce this to the single function on which the Polly 
scop detection takes more than 20.6% of the overall time. To get there,
I propose to run the timings with 'opt -O3' (instead of clang). As a 
first step you disable inlining to make sure that your results are still 
reproducible. If this is the case, I would (semi-automatically?) try to 
reduce the test case by removing functions from it for which the removal 
does not reduce the Polly overhead.

2) Check why the Polly scop detection is failing

You can use 'opt -polly-detect -analyze' to see the most common reasons 
the scop detection failed. We should verify that we perform the most 
common and cheap tests early.

> (3) About detecting scop regions in bottom-up order.
> Detecting scop regions in bottom-up order can significantly speed up the scop detection pass. However, as I have discussed with Sebastian, detecting scops in bottom-up order and up-bottom order will lead to different results. As a result, we should not change the detection order.

Sebastian had a patch for this. Does his patch improve the scop 
detection time.

I agree with you that we can not just switch the order in which we 
detect scops, but I still believe that a bottom up detection is the 
right way to go. However, to abort early we need to classify the scop
detection failures into failures that will equally hold for larger scops
and ones that may disappear in larger scops. Only if a failure of the 
first class was detected, we can abort early without reducing scop coverage.

> Do you have any other suggestions that may speed up the scop detection pass?

I believe bottom-up detection may be a good thing, but before drawing 
conclusions we need to understand where time is actually spent. The 
suggestions above should help us there.

Cheers
Tobi

P.S.: Please do not copy llvm-commits. as it is only for patches and 
commit messages.