[PATCH] D75628: [DA] [SCEV] Provide facility to check for total ordering based on dominance

Mon Mar 9 12:25:20 PDT 2020

Meinersbur added inline comments.

================
Comment at: llvm/lib/Analysis/DependenceAnalysis.cpp:2566
   bool Disproved = false;
   if (testBounds(Dependence::DVEntry::ALL, 0, Bound, Delta)) {
     // Explore the direction vector hierarchy.
----------------
bmahjour wrote:
> Meinersbur wrote:
> > bmahjour wrote:
> > > Meinersbur wrote:
> > > > Is it even correct to call `testBounds` with SCEVExprs bounds with disjoint loop nests? Maybe the bug is in to consider `MaxLevels` instead of `CommonLevels`.
> > > The Banerjee Inequality requires that delta between the coefficients of the first terms of the Diophantine function `h` be in between LB and UB, where LB and UB are defined to be the sum of min/max quantities over **all** loop levels (not just the common levels), so the code is correct in looping over all loops including the siblings. This is more of a limitation in SCEV not being able to represent the bounds when they are non-invariant recurrences that straddle non-dominating loops.
> > > 
> > > I'm actually not quite sure why there is this assumption in SCEV about dominance. Do you know?
> > > I'm actually not quite sure why there is this assumption in SCEV about dominance. Do you know?
> > 
> > AFAIU it is not about dominance, but that SCEVAddRecExpr only being valid within the loop (in the loop applies dominance by the loop header). If outside the loop, `getSCEVAtScope` 'removes' the SCEVAddRecExprs for loops it is not in. Using SCEVAddRecExprs from sibling loops in the same expression would be a contradiction as it cannot be valid in both loops at the same time.
> > 
> > I don't know how `DependenceInfo` is supposed to work comparing instructions in sibling loops, but I'd suspect the problem being there, not in SCEV. Maybe looking up
> > ```
> > // Use Banerjee's Inequalities to test an MIV subscript pair.
> > // (Wolfe, in the race-car book, calls this the Extreme Value Test.)
> > // Generally follows the discussion in Section 2.5.2 of
> > //
> > //    Optimizing Supercompilers for Supercomputers
> > //    Michael Wolfe
> > //
> > ```
> > could help.
> > Using SCEVAddRecExprs from sibling loops in the same expression would be a contradiction as it cannot be valid in both loops at the same time.
> Suppose you have two non-nested sibling loops like this:
> 
> ```
> int i, j;
> for (i = 0; i < n; i++)
>   ...
> for (j = 0; j < n; j++)
>   ...
> 
>  ... = i + j;
> ```
> I can see that if we add `{0,+,1}<j_loop>` and `{0,+,1}<i_loop>` together to form a new SCEV expression it cannot be evaluated in either of the loops. However, if we evaluate `i+j` after both loops and passing `nullptr` to `getSCEVAtScope` we expect to get an expression that evaluates to `2*n`. So theoretically we should be able to represent the expression symbolically, and only fail if trying to evaluate it in either the i-loop or the j-loop. Currently SCEV does not even support creating such an expression symbolically (as per asserts in `CompareSCEVComplexity` and `isKnownViaInduction`), instead of asserting in `getSCEVAtScope`. 
> 
> 
> > I don't know how DependenceInfo is supposed to work comparing instructions in sibling loops, but I'd suspect the problem being there, not in SCEV. Maybe looking up...
> 
> I don't have the "race-car" book, but I did lookup the equations in //Optimizing Compilers for Modern Architectures (by  Randy Allen & Ken Kennedy)// and what the `DependenceInfo` is trying to do makes sense to me. There is a special form of Banerjee Inequality for trapezoidal and triangular loops which is much more complicated, but is expected to produce more accurate results, however applying the original test to trapezoidal/triangular loops should still produce correct, but possibly over-conservative, results.
When using `i` or `j`, `getSCEVAtScope` should be applied before combining with other SCEVs to ensure that we get the right representation for where it is used. That is, `getSCEVAtScope(i) + getSCEVAtScope(j)` instead of `getSCEVAtScope(i+j)`. It might be that nothing interesting happens during `ScalarEvolution::getAddExpr`. However, we'd get into problems if the normalization of SCEVAdd makes use of properties that are only valid within the loop.

Note that this is not an issue when in LCSAA, since the uses of %i and %j after the loops would be $l.lcssa respectively %j.lcssa representing the exit values already. As would if we'd put one of the loops inside an if (e.g. loop guard), forcing an exit node phi:
```
if (c) {
  for (i = 0; i < n; i++) 
    ...
}
for (j = 0; j < n; j++)
  ...
```
Using the SCEVAddRec after the if would be illegal as it ignores what the value of %i would be if `c` was false. It is also wrong according to the dominator assumption for SCEV, probably because of this very reason. For dependence check between the loops, IMHO for the value of %i inside the %j loop, it should use the exit value of %i, not the AddRecExpr at all, even without the conditional. That is, handle it as if it was LCSSA. I think in your test case it is because the AddRecExpr is itself inside another loop and hence not dominating the other loop body.
What does //Optimizing Compilers for Modern Architectures// about this case? Maybe we could ask the original author Preston Briggs <preston.briggs at gmail.com>.

I do have the race-car book and I looked up the Extreme Value Test. The book ignores cases where statements are not in the same loop.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D75628/new/

https://reviews.llvm.org/D75628