[PATCH] D102300: [LoopInterchange] Check lcssa phis in the inner latch in scenarios of multi-level nested loops

Congzhe Cao via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Fri Jun 25 13:00:42 PDT 2021


congzhe added inline comments.


================
Comment at: llvm/test/Transforms/LoopInterchange/innermost-latch-uses-values-in-middle-header.ll:16
+; CHECK: Not interchanging loops. Cannot prove legality.
+define void @innermost_latch_uses_values_in_middle_header() {
+entry:
----------------
Whitney wrote:
> congzhe wrote:
> > Whitney wrote:
> > > congzhe wrote:
> > > > Whitney wrote:
> > > > > Whitney wrote:
> > > > > > congzhe wrote:
> > > > > > > congzhe wrote:
> > > > > > > > Whitney wrote:
> > > > > > > > > congzhe wrote:
> > > > > > > > > > Whitney wrote:
> > > > > > > > > > > Is it possible to show the problem with input IR that the innermost and the middle loop is already interchanged?
> > > > > > > > > > Thanks for the comment! The difficulty is that loop interchange works on loops ordered from the innermost loop toward the outermost loop. Whenever we fail the legality check for a pair of loops, we just bail out the entire pass not trying to interchange the rest of loop nest. The pass does not "skip over" the first pair of candidate loops and go directly to the next one.
> > > > > > > > > > 
> > > > > > > > > > With the input IR that the innermost and the middle loop is already interchanged, when the pass starts working on this loop nest, it would do one of the following:
> > > > > > > > > > 
> > > > > > > > > > 1) The pass either determines it does not want to interchange the (new) innermost and the (new) middle loop likely due to profitability issue and will just bail out, not interchanging anything.
> > > > > > > > > > 
> > > > > > > > > > 2) (Even if we manually change the loop access pattern in the IR to make it pass profitability), it would interchange the the (new) innermost and the (new) middle loop which would result in a new IR that does not expose the problem we would like to expose.
> > > > > > > > > > 
> > > > > > > > > > Therefore I thought the current test case might be a in a sense a straightforward one to catch the problem.  
> > > > > > > > > > - With the trunk code, it run the 1st interchange and crashes on the 2nd interchange. 
> > > > > > > > > > - With this patch, it run the 1st interchange and bails out the 2nd interchange.
> > > > > > > > > > 
> > > > > > > > > > I'm wondering if what I described above makes sense to you? Please correct me if I missed something.
> > > > > > > > > > 
> > > > > > > > > When I run this test case with my opt, I got:
> > > > > > > > > ```
> > > > > > > > > Loops are legal to interchange
> > > > > > > > > Loops interchanged.
> > > > > > > > > Not legal because of current transform limitation
> > > > > > > > > Not interchanging loops. Cannot prove legality.
> > > > > > > > > ```
> > > > > > > > > Can you please check with the latest trunk code, and see if you still see a crash on the 2nd interchange?
> > > > > > > > > The commit-hash I tried with is `1e344ce4f3fac4beb5e23e04dc3dd59398125956`.
> > > > > > > > That's right, the latest trunk code does not crash because of my recent patch: https://reviews.llvm.org/D101305. That patch makes the pass bail out when it detects triangular loops, however, it also catches this IR accidently because SCEV fails to detect an lcssa variable is loop invariant. So the trunk code bails out and just hides the problem described in this patch. 
> > > > > > > > 
> > > > > > > > I can modify my other patch by adding some logic that deals with lcssa phis when calling SCEV, which will expose the crash again. Essentially what this means is that at the end of funciton `isLoopStructureUnderstood()`, I can simply change this piece of code 
> > > > > > > > ```
> > > > > > > > const SCEV *S = SE->getSCEV(Right);
> > > > > > > >     if (!SE->isLoopInvariant(S, OuterLoop))
> > > > > > > >       return false;
> > > > > > > > ```
> > > > > > > > to 
> > > > > > > > ```
> > > > > > > > const SCEV *S = SE->getSCEV(followLCSSA(Right));
> > > > > > > >     if (!SE->isLoopInvariant(S, OuterLoop))
> > > > > > > >       return false;
> > > > > > > > ```
> > > > > > > > 
> > > > > > > > If needed I can add that change to the current patch.
> > > > > > > Just to be more clear: in order to reproduce the crash, we can either checkout to a repo before my other patch (https://reviews.llvm.org/D101305) is committed, or do the change as described above (moving `followLCSSA()` to before `isLoopStructureUnderstood()` is also needed). 
> > > > > > > 
> > > > > > > Also for the test case please add `-loop-interchange-threshold=-100`  to enable transformation for the 2nd interchange. 
> > > > > > > 
> > > > > > > The problem is discovered from a large benchmark with multi-level loop nests. I tried to expose it through minimal IR but I apologize if it still looks not convenient enough to reproduce.
> > > > > > From modifying your original test case, we can show the same problem with an easier logic. Instead of 2 steps and 3 level nested loop, we can show with only attempting to interchange once with a 2 level nested loop. 
> > > > > > Can you simplify all the comments/descriptions and test case?
> > > > > > ```
> > > > > > @a = common global i32 0, align 4
> > > > > > @d = common dso_local local_unnamed_addr global [1 x [6 x i32]] zeroinitializer, align 4
> > > > > > 
> > > > > > define void @innermost_latch_uses_values_in_middle_header() {
> > > > > > entry:
> > > > > >   %0 = load i32, i32* @a, align 4
> > > > > >   %b = add i32 80, 1
> > > > > >   br label %outermost.header
> > > > > > 
> > > > > > outermost.header:                                 ; preds = %outermost.latch, %entry
> > > > > >   %indvar.outermost = phi i32 [ 10, %entry ], [ %indvar.outermost.next, %outermost.latch ]
> > > > > >   %tobool71.i = icmp eq i32 %0, 0
> > > > > >   br i1 %tobool71.i, label %innermost.header.preheader, label %outermost.latch
> > > > > > 
> > > > > > innermost.header.preheader:                       ; preds = %outermost.header
> > > > > >   br label %innermost.header
> > > > > > 
> > > > > > innermost.header:                                 ; preds = %innermost.header.preheader, %innermost.latch.split
> > > > > >   %indvar.innermost = phi i64 [ %1, %innermost.latch.split ], [ 4, %innermost.header.preheader ]
> > > > > >   %indvar.middle.wide = zext i32 %b to i64 ; a def in the middle header
> > > > > >   %arrayidx9.i = getelementptr inbounds [1 x [6 x i32]], [1 x [6 x i32]]* @d, i64 0, i64 %indvar.innermost, i64 %indvar.innermost
> > > > > >   store i32 0, i32* %arrayidx9.i, align 4
> > > > > >   br label %innermost.latch
> > > > > > 
> > > > > > innermost.latch:                                  ; preds = %innermost.body
> > > > > >   %indvar.innermost.next = add nsw i64 %indvar.innermost, 1
> > > > > >   %tobool5.i = icmp eq i64 %indvar.innermost.next, %indvar.middle.wide
> > > > > >   br label %innermost.latch.split
> > > > > > 
> > > > > > innermost.latch.split:                            ; preds = %middle.latch
> > > > > >   %indvar.middle.wide.lcssa = phi i64 [ %indvar.middle.wide, %innermost.latch ]
> > > > > >   %1 = add nsw i64 %indvar.innermost, 1
> > > > > >   %2 = icmp eq i64 %1, %indvar.middle.wide.lcssa
> > > > > >   br i1 %2, label %outermost.latch.loopexit, label %innermost.header
> > > > > > 
> > > > > > outermost.latch.loopexit:                         ; preds = %innermost.latch.split
> > > > > >   br label %outermost.latch
> > > > > > 
> > > > > > outermost.latch:                                  ; preds = %outermost.latch.loopexit, %outermost.header
> > > > > >   %indvar.outermost.next = add nsw i32 %indvar.outermost, -5
> > > > > >   %tobool.i = icmp eq i32 %indvar.outermost.next, 0
> > > > > >   br i1 %tobool.i, label %outermost.exit, label %outermost.header
> > > > > > 
> > > > > > outermost.exit:                                   ; preds = %outermost.latch
> > > > > >   ret void
> > > > > > }
> > > > > > ```
> > > > > > 
> > > > > > And it looks to me that we could replace all uses of `%indvar.middle.wide.lcssa` with `%indvar.middle.wide` and remove `%indvar.middle.wide.lcssa` to continue interchange. Have you thought about that?
> > > > > Another idea is to clone the incoming values of the lcssa phi, as we know it is loop invariance, or else it would bail out by `isLoopStructureUnderstood`.
> > > > @Whitney My apologies for not being able to reply sooner.
> > > > 
> > > > Thanks a lot for working out this simpler test case! It does expose the same problem, which is that some variables used in the new outer latch are not always available after interchange.
> > > > 
> > > > A minor concern for this test is that, usually `%indvar.middle.wide.lcssa` in the original inner latch is not supposed to exist since its incoming value is defined in the same loop whereas an lcssa phi is supposed to have incoming values from a loop that is "one-level inner" in the loop nest. So I guess in reality we would not see such patterns as shown in the simpler test, and I completely agree with you that in this case we can just replace all uses of `%indvar.middle.wide.lcssa` with `%indvar.middle.wide` so that interchange would not cause any problem.
> > > > 
> > > > Originally I used a triply-nested loop as test because the problematic lcssa phi in the intermediate middle latch that has incoming values from the intermediate inner header is a valid lcssa phi and should exist. The corresponding IR (the provided test after 1st interchange) is shown below, where we see the lcssa phi `%indvar.middle.wide.lcssa` in `innermost.latch.split` (which is the intermediate middle latch, and acts as the inner loop latch when doing 2nd interchange) is actually valid and should be there. After interchange `innermost.latch.split` would become the final outermost latch, and its incoming value, i.e., `%indvar.middle.wide` is not always available at the final outermost latch. If we follow the edge directly from the final outermost header to the final outermost latch, `%indvar.middle.wide` would not be available.
> > > > 
> > > > ```
> > > > define void @innermost_latch_uses_values_in_middle_header_after_first_interchange() {
> > > > entry:
> > > >   %0 = load i32, i32* @a, align 4
> > > >   %b = add i32 80, 1
> > > >   br label %outermost.header
> > > > 
> > > > outermost.header:                                 ; preds = %outermost.latch, %entry
> > > >   %indvar.outermost = phi i32 [ 10, %entry ], [ %indvar.outermost.next, %outermost.latch ]
> > > >   %tobool71.i = icmp eq i32 %0, 0
> > > >   br i1 %tobool71.i, label %innermost.header.preheader, label %outermost.latch
> > > > 
> > > > middle.header.preheader:                          ; preds = %innermost.header
> > > >   br label %middle.header
> > > > 
> > > > middle.header:                                    ; preds = %middle.header.preheader, %middle.latch
> > > >   %indvar.middle = phi i64 [ %indvar.middle.next, %middle.latch ], [ 4, %middle.header.preheader ]
> > > >   %indvar.middle.wide = zext i32 %b to i64
> > > >   br label %innermost.header.split
> > > > 
> > > > innermost.header.preheader:                       ; preds = %outermost.header
> > > >   br label %innermost.header
> > > > 
> > > > innermost.header:                                 ; preds = %innermost.header.preheader, %innermost.latch.split
> > > >   %indvar.innermost = phi i64 [ %1, %innermost.latch.split ], [ 4, %innermost.header.preheader ]
> > > >   br label %middle.header.preheader
> > > > 
> > > > innermost.header.split:                           ; preds = %middle.header
> > > >   br label %innermost.body
> > > > 
> > > > innermost.body:                                   ; preds = %innermost.header.split
> > > >   %arrayidx9.i = getelementptr inbounds [1 x [6 x i32]], [1 x [6 x i32]]* @d, i64 0, i64 %indvar.innermost, i64 %indvar.middle
> > > >   store i32 0, i32* %arrayidx9.i, align 4
> > > >   br label %innermost.latch
> > > > 
> > > > innermost.latch:                                  ; preds = %innermost.body
> > > >   %indvar.innermost.next = add nsw i64 %indvar.innermost, 1
> > > >   %tobool5.i = icmp eq i64 %indvar.innermost.next, %indvar.middle.wide
> > > >   br label %middle.latch
> > > > 
> > > > innermost.latch.split:                            ; preds = %middle.latch
> > > >   %indvar.middle.wide.lcssa = phi i64 [ %indvar.middle.wide, %middle.latch ]
> > > >   %1 = add nsw i64 %indvar.innermost, 1
> > > >   %2 = icmp eq i64 %1, %indvar.middle.wide.lcssa
> > > >   br i1 %2, label %outermost.latch.loopexit, label %innermost.header
> > > > 
> > > > middle.latch:                                     ; preds = %innermost.latch
> > > >   %indvar.middle.next = add nsw i64 %indvar.middle, -1
> > > >   %tobool2.i = icmp eq i64 %indvar.middle.next, 0
> > > >   br i1 %tobool2.i, label %innermost.latch.split, label %middle.header
> > > > 
> > > > outermost.latch.loopexit:                         ; preds = %innermost.latch.split
> > > >   br label %outermost.latch
> > > > 
> > > > outermost.latch:                                  ; preds = %outermost.latch.loopexit, %outermost.header
> > > >   %indvar.outermost.next = add nsw i32 %indvar.outermost, -5
> > > >   %tobool.i = icmp eq i32 %indvar.outermost.next, 0
> > > >   br i1 %tobool.i, label %outermost.exit, label %outermost.header
> > > > 
> > > > outermost.exit:                                   ; preds = %outermost.latch
> > > >   ret void
> > > > }
> > > > ```
> > > > 
> > > > The key point here is that usually lcssa phis are not supposed to exist in loop latches but are supposed to exist in loop exist blocks. However in scenarios I described in this patch, lcssa phis do exist in loop latches and they are perfectly valid lcssa phis. Therefore the patch aims at dealing with that scenario.
> > > > 
> > > > Because of my recent update in `isLoopStructureUnderstood`, that problem does not cause segmentation fault for now but just bails from `isLoopStructureUnderstood`. This is because of insufficient capability of SCEV to determine `%indvar.middle.wide` to be loop invariant. If SCEV is improved in the future such that we would not bail from `isLoopStructureUnderstood`, the segmentation fault would appear again.
> > > > 
> > > > I'm wondering if what I described above looks clear to you? In terms of moving forward, I'm leaning toward sticking to the test case I provided (i.e., keeping the patch as it is) because that is something that occurs in reality. But if you believe we could use your simplified test case and update all the comments/descriptions correspondingly, I could do that as well.
> > > I am thinking that there are two cases:
> > > 1. If SCEV can determine %indvar.middle.wide to be loop invariant, then we should clone/hoist %indvar.middle.wide out, so it is available for %indvar.middle.wide.lcssa.
> > > 2. If SCEV cannot determine %indvar.middle.wide to be loop invariant, then we should bail out from `isLoopStructureUnderstood`. 
> > > 
> > > then we don't need `areInnerLoopLatchPHIsSupported`. Is my understanding correct?
> > > 
> > > I see your point of your test case, as it is closer to the reality. The problem is the comments are quite complex to think about for readers. 
> > Thanks for the comment, yes I agree with `2`; for `1` I mostly agree that `%indvar.middle.wide` should be available for `%indvar.middle.wide.lcssa` if it is hoisted. Nevertheless the loop interchange pass seems not working as expected -- I debugged the test case you provided, SCEV can actually determine `%indvar.middle.wide` is invariant so it does not bail from `isLoopStructureUnderstood`. Even if we run '-licm' before '-loop-interchange' and hoist `%indvar.middle.wide`, this test still crashes. The reason seems to be that, loop interchange pass did not take this case into consideration so when it moves BBs and instructions around, it breaks the IR.
> > 
> > Digging into that crash may need extra effort, for now adding `areInnerLoopLatchPHIsSupported` may make the pass more robust and crash less often. If we determine to merge this patch, I can try to make the comments more straightforward and probably use your test case instead of mine, to make the logic easier to understand.
> > 
> > Internally we also see a nested loop with 6 nesting levels fails interchange due to similar reasons. It did not bail from `isLoopStructureUnderstood` and it crashes after loop interchange transformation. So it seems there's other cases apart from `1` and `2`. I have not yet looked deep into the details, but applying this patch internally avoids that crash.
> I am fine with having this patch in in the meantime, but let's add a TODO to actually handle it in loop interchange, instead of bailing out.
Thanks, I'll make the comments more straightforward, use your test case and add a TODO to describe what needs to be done next. If you have other comments or suggestions, please let me know.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D102300/new/

https://reviews.llvm.org/D102300



More information about the llvm-commits mailing list