[PATCH] D102300: [LoopInterchange] Check lcssa phis in the inner latch in scenarios of multi-level nested loops

Sun May 30 19:53:24 PDT 2021

Whitney added inline comments.

================
Comment at: llvm/test/Transforms/LoopInterchange/innermost-latch-uses-values-in-middle-header.ll:16
+; CHECK: Not interchanging loops. Cannot prove legality.
+define void @innermost_latch_uses_values_in_middle_header() {
+entry:
----------------
congzhe wrote:
> congzhe wrote:
> > Whitney wrote:
> > > congzhe wrote:
> > > > Whitney wrote:
> > > > > Is it possible to show the problem with input IR that the innermost and the middle loop is already interchanged?
> > > > Thanks for the comment! The difficulty is that loop interchange works on loops ordered from the innermost loop toward the outermost loop. Whenever we fail the legality check for a pair of loops, we just bail out the entire pass not trying to interchange the rest of loop nest. The pass does not "skip over" the first pair of candidate loops and go directly to the next one.
> > > > 
> > > > With the input IR that the innermost and the middle loop is already interchanged, when the pass starts working on this loop nest, it would do one of the following:
> > > > 
> > > > 1) The pass either determines it does not want to interchange the (new) innermost and the (new) middle loop likely due to profitability issue and will just bail out, not interchanging anything.
> > > > 
> > > > 2) (Even if we manually change the loop access pattern in the IR to make it pass profitability), it would interchange the the (new) innermost and the (new) middle loop which would result in a new IR that does not expose the problem we would like to expose.
> > > > 
> > > > Therefore I thought the current test case might be a in a sense a straightforward one to catch the problem.  
> > > > - With the trunk code, it run the 1st interchange and crashes on the 2nd interchange. 
> > > > - With this patch, it run the 1st interchange and bails out the 2nd interchange.
> > > > 
> > > > I'm wondering if what I described above makes sense to you? Please correct me if I missed something.
> > > > 
> > > When I run this test case with my opt, I got:
> > > ```
> > > Loops are legal to interchange
> > > Loops interchanged.
> > > Not legal because of current transform limitation
> > > Not interchanging loops. Cannot prove legality.
> > > ```
> > > Can you please check with the latest trunk code, and see if you still see a crash on the 2nd interchange?
> > > The commit-hash I tried with is `1e344ce4f3fac4beb5e23e04dc3dd59398125956`.
> > That's right, the latest trunk code does not crash because of my recent patch: https://reviews.llvm.org/D101305. That patch makes the pass bail out when it detects triangular loops, however, it also catches this IR accidently because SCEV fails to detect an lcssa variable is loop invariant. So the trunk code bails out and just hides the problem described in this patch. 
> > 
> > I can modify my other patch by adding some logic that deals with lcssa phis when calling SCEV, which will expose the crash again. Essentially what this means is that at the end of funciton `isLoopStructureUnderstood()`, I can simply change this piece of code 
> > ```
> > const SCEV *S = SE->getSCEV(Right);
> >     if (!SE->isLoopInvariant(S, OuterLoop))
> >       return false;
> > ```
> > to 
> > ```
> > const SCEV *S = SE->getSCEV(followLCSSA(Right));
> >     if (!SE->isLoopInvariant(S, OuterLoop))
> >       return false;
> > ```
> > 
> > If needed I can add that change to the current patch.
> Just to be more clear: in order to reproduce the crash, we can either checkout to a repo before my other patch (https://reviews.llvm.org/D101305) is committed, or do the change as described above (moving `followLCSSA()` to before `isLoopStructureUnderstood()` is also needed). 
> 
> Also for the test case please add `-loop-interchange-threshold=-100`  to enable transformation for the 2nd interchange. 
> 
> The problem is discovered from a large benchmark with multi-level loop nests. I tried to expose it through minimal IR but I apologize if it still looks not convenient enough to reproduce.
>From modifying your original test case, we can show the same problem with an easier logic. Instead of 2 steps and 3 level nested loop, we can show with only attempting to interchange once with a 2 level nested loop. 
Can you simplify all the comments/descriptions and test case?
```
@a = common global i32 0, align 4
@d = common dso_local local_unnamed_addr global [1 x [6 x i32]] zeroinitializer, align 4

define void @innermost_latch_uses_values_in_middle_header() {
entry:
  %0 = load i32, i32* @a, align 4
  %b = add i32 80, 1
  br label %outermost.header

outermost.header:                                 ; preds = %outermost.latch, %entry
  %indvar.outermost = phi i32 [ 10, %entry ], [ %indvar.outermost.next, %outermost.latch ]
  %tobool71.i = icmp eq i32 %0, 0
  br i1 %tobool71.i, label %innermost.header.preheader, label %outermost.latch

innermost.header.preheader:                       ; preds = %outermost.header
  br label %innermost.header

innermost.header:                                 ; preds = %innermost.header.preheader, %innermost.latch.split
  %indvar.innermost = phi i64 [ %1, %innermost.latch.split ], [ 4, %innermost.header.preheader ]
  %indvar.middle.wide = zext i32 %b to i64 ; a def in the middle header
  %arrayidx9.i = getelementptr inbounds [1 x [6 x i32]], [1 x [6 x i32]]* @d, i64 0, i64 %indvar.innermost, i64 %indvar.innermost
  store i32 0, i32* %arrayidx9.i, align 4
  br label %innermost.latch

innermost.latch:                                  ; preds = %innermost.body
  %indvar.innermost.next = add nsw i64 %indvar.innermost, 1
  %tobool5.i = icmp eq i64 %indvar.innermost.next, %indvar.middle.wide
  br label %innermost.latch.split

innermost.latch.split:                            ; preds = %middle.latch
  %indvar.middle.wide.lcssa = phi i64 [ %indvar.middle.wide, %innermost.latch ]
  %1 = add nsw i64 %indvar.innermost, 1
  %2 = icmp eq i64 %1, %indvar.middle.wide.lcssa
  br i1 %2, label %outermost.latch.loopexit, label %innermost.header

outermost.latch.loopexit:                         ; preds = %innermost.latch.split
  br label %outermost.latch

outermost.latch:                                  ; preds = %outermost.latch.loopexit, %outermost.header
  %indvar.outermost.next = add nsw i32 %indvar.outermost, -5
  %tobool.i = icmp eq i32 %indvar.outermost.next, 0
  br i1 %tobool.i, label %outermost.exit, label %outermost.header

outermost.exit:                                   ; preds = %outermost.latch
  ret void
}
```

And it looks to me that we could replace all uses of `%indvar.middle.wide.lcssa` with `%indvar.middle.wide` and remove `%indvar.middle.wide.lcssa` to continue interchange. Have you thought about that?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D102300/new/

https://reviews.llvm.org/D102300