[PATCH] D98263: [LoopInterchange] fix tightlyNested() in LoopInterchange legality

Wed Mar 10 22:51:38 PST 2021

congzhe added a comment.

In D98263#2618403 <https://reviews.llvm.org/D98263#2618403>, @TaWeiTu wrote:

> Hi @congzhe, thanks for the fix!
>
> I think it'd probably be better to keep the `LoopNest::checkLoopsStructure` and add the additional checks of branch instructions for now.
> By doing so, future changes to `LoopNest` will also apply in `LoopInterchange`, which is what we are trying to achieve I think (by making the definition of perfectly-nested loop in sync).
> The check can be removed in the future, when `LoopNest` is eventually taught to correctly identify guard branches (`singleSucc` is currently considered to be the guard branch of the inner loop, but the loops should only be considered perfectly-nested if the guard branch corresponds to the one added by `LoopRotate` IMO, which is not true in this case).
>
> What do you think?

Thanks for the reply! I see your point.

I am just not entirely sure whether we should keep `LoopNest::checkLoopsStructure()`  in `tightlyNested()` in loop interchange. As I described above, `LoopNest::checkLoopsStructure()` does comprehensive checks and some of the scenarios that it considered may not be applicable to loop interchange, because the transformation phase in loop interchange is not comprehensive enough to handle all those scenarios. The scenario with the inner loop guard issue as I added in this patch is only one of those, I can think of other scenarios that can pass `LoopNest::checkLoopsStructure()` but fails loop interchange transformation phase. I have listed two of them below.

Therefore even we handle the inner loop guard appropriately, there's other scenarios that would fail if we do keep `LoopNest::checkLoopsStructure()`. Essentially I think from practical point of view, it does make sense that `tightlyNest()` is not entirely the same as `LoopNest::checkLoopsStructure()`, since `tightlyNest()` is specifically tailored for loop interchange hence has its own logic in checking the loop structure.

**Scenario 1**:

  @A = common global [100 x [100 x i64]] zeroinitializer
  @B = common global [100 x i64] zeroinitializer

  define void @crash(i64 %k, i64 %N, i64 signext %ny) {
  entry:
    br label %for1.header

  for1.header:
    %j23 = phi i64 [ 0, %entry ], [ %j.next24, %for1.inc10 ]
    %cmp21 = icmp slt i64 0, %ny
    br i1 %cmp21, label %preheader.j, label %for.j.end

  preheader.j:  
    br label %for2

  for2:
    %j = phi i64 [ %j.next, %for.j.inc ], [ 0, %preheader.j ]
    %arrayidx5 = getelementptr inbounds [100 x [100 x i64]], [100 x [100 x i64]]* @A, i64 0, i64 %j, i64 %j23
    %lv = load i64, i64* %arrayidx5
    %add = add nsw i64 %lv, %k
    store i64 %add, i64* %arrayidx5
    br label %for.j.inc

  for.j.inc:                                       
    %j.next = add nuw nsw i64 %j, 1
    %exitcond = icmp eq i64 %j, 99
    br i1 %exitcond, label %for2, label %for.j.end_crit_edge

  for.j.end_crit_edge:                              
    %split = phi i64 [ %add, %for.j.inc ]
    br label %for.j.end

  for.j.end:                                        
    ;%res.1.lcssa = phi i64 [ %split, %for.j.end_crit_edge ], [ %k, %for1.header ]
    br label %for1.inc10

  for1.inc10:
    %j.next24 = add nuw nsw i64 %j23, 1
    %exitcond26 = icmp eq i64 %j23, 99
    br i1 %exitcond26, label %for.end12, label %for1.header

  for.end12:
    ret void
  }

**Scenario 2**

  @A = common global [100 x [100 x i64]] zeroinitializer
  @B = common global [100 x i64] zeroinitializer

  define void @crash(i64 %k, i64 %N, i64 signext %ny) {
  entry:
    br label %for1.header

  for1.header:
    %j23 = phi i64 [ 0, %entry ], [ %j.next24, %for1.inc10 ]    
    br label %singleSucc

  singleSucc:
    br label %preheader.j

  preheader.j:  
    br label %preheader.j2

  preheader.j2:  
    br label %preheader.j3

  preheader.j3:  
    br label %for2

  for2:
    %j = phi i64 [ %j.next, %for2 ], [ 0, %preheader.j3 ]
    %arrayidx5 = getelementptr inbounds [100 x [100 x i64]], [100 x [100 x i64]]* @A, i64 0, i64 %j, i64 %j23
    %lv = load i64, i64* %arrayidx5
    %add = add nsw i64 %lv, %k
    store i64 %add, i64* %arrayidx5
    %j.next = add nuw nsw i64 %j, 1
    %exitcond = icmp eq i64 %j, 99
    br i1 %exitcond, label %for1.inc10, label %for2

  for1.inc10:
    %j.next24 = add nuw nsw i64 %j23, 1
    %exitcond26 = icmp eq i64 %j23, 99
    br i1 %exitcond26, label %for.end12, label %for1.header

  for.end12:
    ret void
  }

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D98263/new/

https://reviews.llvm.org/D98263