[PATCH] D30350: [LSR] Add a cap for reassociation of AllFixupsOutsideLoop type LSRUse to protect compile time

Quentin Colombet via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Fri Mar 24 23:58:39 PDT 2017


qcolombet added a comment.

Hi Wei,

Sorry for the delay, I thought Andy's comment was clear enough for you to explore other directions. So I didn't pay attention to the review.

Anyhow, at this point you would have guessed it, but I agree with Andy. I think this patch is a workaround for a problem we yet don't understand.

I wanted to provide a patch to demonstrate Andy's point and what I found when playing around is interesting. (Patch attached nonetheless: F3172128: experimental_lsr_reassoc.diff <https://reviews.llvm.org/F3172128> )

TL;DR I believe the problem is in the SCEVExpander code and not in LSR per se, I would suggest to recreate a profile using `opt -loop-reduce` and look where the time is spent.

Basically, I wanted to limit the number of reassociations per recursive call to 5. The max depth of the call is 3. The call happens for each base register (typically less than 3) of all formulae (typically a couple) for each LSRUse.
So, that gives us a total of: 5 * 3 * 3 * 2 = 90 reassociation formulae per LSRUse and given that they are typically 4 uses per loop, the grant total is less than 400, which does not sound too bad to explore.

However, with such limit, the compile time in the example from PR32042 was already blowing up (I killed it after 4 min).

So, I tried with a limit of 2 reassociations per call, i.e., about 36 formulae per LSRUse (to be exact the example generate 25 formulae), and still opt didn't finish after 4 min.

In the end, the only way to get a reasonable compile time was to not allow any reassociation. I started to become suspicious that the problem is not in the size of the search space and after doing a quick profile, I found that at least 80% of the compile time is spent materializing the solution in SCEVExpander. The problem seems to be related to the "nesting-level" of the expression of the formula that the solver picks.
Apply my patch and compare the output of lsr-num-reassoc=1 and lsr-num-reassoc=2 to see how the nesting level materialize.

Admittedly my profile may be bogus (my Instruments is quite old and the trace looked weird), but I would nonetheless recommend to look into creating a proper profile and investigating what is going on instead of bypassing the whole reassociation process.

That being said, I also understand that this patch blocks some people, so I would be okay for you to push it now as long as you commit to look at the actual problem in a reasonable time. In other words, if this patch is really needed to unblock people, go for it, but I want a different solution in a near future.

Cheers,
-Quentin



================
Comment at: lib/Transforms/Scalar/LoopStrengthReduce.cpp:3429
+  // Arbitrarily choose a cap size of AddOps to protect compile time.
+  if (AddOps.size() >= 5 && LU.AllFixupsOutsideLoop)
+    return;
----------------
wmi wrote:
> davide wrote:
> > How did you chose this cap, BTW? 
> > Maybe we should do some measurements and pick a less arbitrary value?
> On one side, for AddOps.size()==5, even without the workaround, the compile time will not bloat up too much, so I think it can cover the cases suffering compile time issue. On the other side, not many cases have very large AddOps numbers. I choose 5 which is not a too small number so it will not kick in frequently. Since current model handling AllFixupsOutsideLoop is very imprecise,  I think it is better to just fix the compile time issue and don't change the existing code. 
> 
> I found no regressions using some internal benchmarks.
> 
> 
What is the highest number you get from the LLVM test suite?
Same for clang self-host?


================
Comment at: lib/Transforms/Scalar/LoopStrengthReduce.cpp:3430
+  if (AddOps.size() >= 5 && LU.AllFixupsOutsideLoop)
+    return;
+
----------------
sanjoy wrote:
> I'm not too familiar with LSR, but this looks pretty ad-hoc -- why can't the same compile-time blowup happen for an `LSRUse` with `AllFixupsOutsideLoop` = `false`?
I agree with Sanjoy that there is nothing preventing this for happening when AllFixupsOutsideLoop is false.


Repository:
  rL LLVM

https://reviews.llvm.org/D30350





More information about the llvm-commits mailing list