[PATCH] D75628: [DA] [SCEV] Provide facility to check for total ordering based on dominance

Thu Mar 26 09:12:01 PDT 2020

bmahjour marked an inline comment as done.
bmahjour added inline comments.

================
Comment at: llvm/lib/Analysis/DependenceAnalysis.cpp:2566
   bool Disproved = false;
   if (testBounds(Dependence::DVEntry::ALL, 0, Bound, Delta)) {
     // Explore the direction vector hierarchy.
----------------
Meinersbur wrote:
> bmahjour wrote:
> > Meinersbur wrote:
> > > bmahjour wrote:
> > > > Meinersbur wrote:
> > > > > Is it even correct to call `testBounds` with SCEVExprs bounds with disjoint loop nests? Maybe the bug is in to consider `MaxLevels` instead of `CommonLevels`.
> > > > The Banerjee Inequality requires that delta between the coefficients of the first terms of the Diophantine function `h` be in between LB and UB, where LB and UB are defined to be the sum of min/max quantities over **all** loop levels (not just the common levels), so the code is correct in looping over all loops including the siblings. This is more of a limitation in SCEV not being able to represent the bounds when they are non-invariant recurrences that straddle non-dominating loops.
> > > > 
> > > > I'm actually not quite sure why there is this assumption in SCEV about dominance. Do you know?
> > > > I'm actually not quite sure why there is this assumption in SCEV about dominance. Do you know?
> > > 
> > > AFAIU it is not about dominance, but that SCEVAddRecExpr only being valid within the loop (in the loop applies dominance by the loop header). If outside the loop, `getSCEVAtScope` 'removes' the SCEVAddRecExprs for loops it is not in. Using SCEVAddRecExprs from sibling loops in the same expression would be a contradiction as it cannot be valid in both loops at the same time.
> > > 
> > > I don't know how `DependenceInfo` is supposed to work comparing instructions in sibling loops, but I'd suspect the problem being there, not in SCEV. Maybe looking up
> > > ```
> > > // Use Banerjee's Inequalities to test an MIV subscript pair.
> > > // (Wolfe, in the race-car book, calls this the Extreme Value Test.)
> > > // Generally follows the discussion in Section 2.5.2 of
> > > //
> > > //    Optimizing Supercompilers for Supercomputers
> > > //    Michael Wolfe
> > > //
> > > ```
> > > could help.
> > > Using SCEVAddRecExprs from sibling loops in the same expression would be a contradiction as it cannot be valid in both loops at the same time.
> > Suppose you have two non-nested sibling loops like this:
> > 
> > ```
> > int i, j;
> > for (i = 0; i < n; i++)
> >   ...
> > for (j = 0; j < n; j++)
> >   ...
> > 
> >  ... = i + j;
> > ```
> > I can see that if we add `{0,+,1}<j_loop>` and `{0,+,1}<i_loop>` together to form a new SCEV expression it cannot be evaluated in either of the loops. However, if we evaluate `i+j` after both loops and passing `nullptr` to `getSCEVAtScope` we expect to get an expression that evaluates to `2*n`. So theoretically we should be able to represent the expression symbolically, and only fail if trying to evaluate it in either the i-loop or the j-loop. Currently SCEV does not even support creating such an expression symbolically (as per asserts in `CompareSCEVComplexity` and `isKnownViaInduction`), instead of asserting in `getSCEVAtScope`. 
> > 
> > 
> > > I don't know how DependenceInfo is supposed to work comparing instructions in sibling loops, but I'd suspect the problem being there, not in SCEV. Maybe looking up...
> > 
> > I don't have the "race-car" book, but I did lookup the equations in //Optimizing Compilers for Modern Architectures (by  Randy Allen & Ken Kennedy)// and what the `DependenceInfo` is trying to do makes sense to me. There is a special form of Banerjee Inequality for trapezoidal and triangular loops which is much more complicated, but is expected to produce more accurate results, however applying the original test to trapezoidal/triangular loops should still produce correct, but possibly over-conservative, results.
> When using `i` or `j`, `getSCEVAtScope` should be applied before combining with other SCEVs to ensure that we get the right representation for where it is used. That is, `getSCEVAtScope(i) + getSCEVAtScope(j)` instead of `getSCEVAtScope(i+j)`. It might be that nothing interesting happens during `ScalarEvolution::getAddExpr`. However, we'd get into problems if the normalization of SCEVAdd makes use of properties that are only valid within the loop.
> 
> Note that this is not an issue when in LCSAA, since the uses of %i and %j after the loops would be $l.lcssa respectively %j.lcssa representing the exit values already. As would if we'd put one of the loops inside an if (e.g. loop guard), forcing an exit node phi:
> ```
> if (c) {
>   for (i = 0; i < n; i++) 
>     ...
> }
> for (j = 0; j < n; j++)
>   ...
> ```
> Using the SCEVAddRec after the if would be illegal as it ignores what the value of %i would be if `c` was false. It is also wrong according to the dominator assumption for SCEV, probably because of this very reason. For dependence check between the loops, IMHO for the value of %i inside the %j loop, it should use the exit value of %i, not the AddRecExpr at all, even without the conditional. That is, handle it as if it was LCSSA. I think in your test case it is because the AddRecExpr is itself inside another loop and hence not dominating the other loop body.
> What does //Optimizing Compilers for Modern Architectures// about this case? Maybe we could ask the original author Preston Briggs <preston.briggs at gmail.com>.
> 
> I do have the race-car book and I looked up the Extreme Value Test. The book ignores cases where statements are not in the same loop.
> 
I agree LCSSA can avoid this issue, but we don't (shouldn't) require LCSSA form for dependence analysis to work.

>From what I can see, getSCEVAtScope does not consider control flow structures at all when querying at top-level scope (ie we cannot distinguish between guarded vs non-guarded top-level scopes), so even if we did `getSCEVAtScope(i) + getSCEVAtScope(j)` we would still have the problem that `i-loop` may not have been executed.

I suppose the control flow issue may be solvable with predication, but even then we should still be able to symbolically represent the expressions and perform simplification before trying to evaluate with getSCEVAtScope. For example `getSCEVAtScope(i<c>+j-i<c>)` gets simplified to `getSCEVAtScope(j)` which can be evaluated without violating the dominance relationship. Maybe the assertions should move to getSCEVAtScope? @reames Phillip could you please share your thoughts on this?

> IMHO for the value of %i inside the %j loop, it should use the exit value of %i, not the AddRecExpr at all, even without the conditional. That is, handle it as if it was LCSSA

In order to do that first we need to detect if we are in a situation where one loop does not dominate another (similar to the proposed solution in this patch) and if so use the exit values. The exit values would have to be computed using `getSCEVAtScope` I assume, but `getSCEVAtScope` may return the input SCEV if it cannot compute the exit value which puts us back at square one.

For the case where we have a conditionally guarded loop, we can conservatively assume that the condition is always true and add the loop bounds to the set of constraints being considered. Since the dependence is disproved only if the Diophantine function is outside of the sum of min/max values for all the bounds, by considering the guarded loop bound we are only tightening the constraint and being more conservative. I'll ask Preston to confirm this.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D75628/new/

https://reviews.llvm.org/D75628