[PATCH] D76132: [LoopUnrollAndJam] Changed safety checks to consider more than 2-levels loop nest.

Thu Mar 19 05:54:52 PDT 2020

Whitney marked 7 inline comments as done.
Whitney added inline comments.

================
Comment at: llvm/lib/Transforms/Utils/LoopUnrollAndJam.cpp:914
   // Make sure we can move all instructions we need to before the subloop
+  BasicBlock *Header = L->getHeader();
+  BasicBlock *Latch = L->getLoopLatch();
----------------
dmgreen wrote:
> Whitney wrote:
> > dmgreen wrote:
> > > Whitney wrote:
> > > > dmgreen wrote:
> > > > > Should this have a check for a 2 deep loop nest at the moment (like before), if the remainder of the analysis/transform code hasn't been updated yet? It looks like the count calculation might just exclude anything with multiple subloop blocks at the moment anyway, so is possibly not a problem in practice, without pragma's.
> > > > Not sure I understand, this check is already like before. 
> > > Sorry, the actual line I was pointing to was semi-random. That wasn't clear.
> > > 
> > > Does the processHeaderPhiOperands check below need to check each level? IIRC it's testing data-dependencies (as in ssa/use-def dependencies, as opposed to the memory dependencies in checkDependencies. Can we physically move any instruction we need from aft to fore). If we end up moving multiple levels past one another, do we have to make the same checks at each level?
> > > 
> > > My general point was that some of the code still only handles 2-deep loop nests. Should we have a check somewhere (perhaps with a fixme next it) that still tests for that condition, until the rest of the code has caught up?
> > ```
> >     Because of the way we rearrange basic blocks, we also require that
> >     the Fore blocks of L on all unrolled iterations are safe to move before the
> >     blocks of the direct child of L of all iterations. So we require that the
> >     phi node looping operands of ForeHeader can be moved to at least the end of
> >     ForeEnd, so that we can arrange cloned Fore Blocks before the subloop and
> >     match up Phi's correctly.
> > ```
> > As we are only unrolling L, not its child, we don't need to move instructions from non-L AftBlock to non-L ForeBlock, so we don't need to check if the moves are safe. 
> > 
> > > My general point was that some of the code still only handles 2-deep loop nests. Should we have a check somewhere (perhaps with a fixme next it) that still tests for that condition, until the rest of the code has caught up?
> > I modified all checks I found needed in `isSafeToUnrollAndJam`, am I missing something?
> In the summary you have B' moving past D. And we need to be sure that B' doesn't depend on anything from D.
> 
> I think of it as B(1,0) needs to move past D(0,0). the "j" level loop isn't unrolled, but there still some movement needed at the "i" level.
Here we are talking about def-use dependence. 
If an instruction in B' (x2) depend on an instruction in D (y),
means there must be an instruction in B (x) that depend on instruction y in D,
as B' is clone from B.
```
B:
  x = phi [y, D]...
D:
  y =
```
As we are placing B' after B, and y is available for B, then y must also be available for B'.

Please correct me if I am wrong.

================
Comment at: llvm/test/Transforms/LoopUnrollAndJam/dependencies.ll:1
-; RUN: opt -basicaa -loop-unroll-and-jam -allow-unroll-and-jam -unroll-and-jam-count=4 < %s -S | FileCheck %s
-; RUN: opt -aa-pipeline=basic-aa -passes='unroll-and-jam' -allow-unroll-and-jam -unroll-and-jam-count=4 < %s -S | FileCheck %s
+; RUN: opt -da-disable-delinearization-checks -basicaa -loop-unroll-and-jam -allow-unroll-and-jam -unroll-and-jam-count=4 < %s -S | FileCheck %s
+; RUN: opt -da-disable-delinearization-checks -aa-pipeline=basic-aa -passes='unroll-and-jam' -allow-unroll-and-jam -unroll-and-jam-count=4 < %s -S | FileCheck %s
----------------
dmgreen wrote:
> Why is this now using da-disable-delinearization-checks, and why have some of these existing tests been changed to use constant size arrays?
`-da-disable-delinearization-checks` is added to more accurately delinearization of fixed-size multi-dimensional arrays. See 
https://reviews.llvm.org/D72178 more detail explaination. 

> why have some of these existing tests been changed to use constant size arrays

They were originally testing single dimensional arrays, which may not be ideal for testing sub-sub portion of code. 

================
Comment at: llvm/test/Transforms/LoopUnrollAndJam/dependencies.ll:363
 ; CHECK-NOT: %j.1 = phi
-define void @sub_sub_less(i32* noalias nocapture %A, i32 %N, i32* noalias nocapture readonly %B) {
 entry:
----------------
This test was orignally testing
```
for i
  for j
    A[i]
    A[i-1]
```
which should be **safe** to unroll and jam. 

I think it actually want to test the code for sub with sub with the dependence distance of the inner loop is less.
```
for i
  for j
    A[i][j]
    A[i+1][j-1]
```

================
Comment at: llvm/test/Transforms/LoopUnrollAndJam/dependencies.ll:401
 ; CHECK: %j.1 = phi
-define void @sub_sub_eq(i32* noalias nocapture %A, i32 %N, i32* noalias nocapture readonly %B) {
 entry:
----------------
This test was orignally testing
```
for i
  for j
    A[i]
    A[i]
```

I think it actually want to test the code for sub with sub with the dependence distance of the inner loop is eq.
```
for i
  for j
    A[i][j]
    A[i+1][j]
```

================
Comment at: llvm/test/Transforms/LoopUnrollAndJam/dependencies.ll:439
-; CHECK-NOT: %j.1 = phi
-define void @sub_sub_more(i32* noalias nocapture %A, i32 %N, i32* noalias nocapture readonly %B) {
 entry:
----------------
This test was orignally testing
```
for i
  for j
    A[i]
    A[i+1]
```
which should be **safe** to unroll and jam. 

I think it actually want to test the code for sub with sub with the dependence distance of the inner loop is more.
```
for i
  for j
    A[i][j]
    A[i+1][j+1]
```

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D76132/new/

https://reviews.llvm.org/D76132