[PATCH] D76132: [LoopUnrollAndJam] Changed safety checks to consider more than 2-levels loop nest.

Mon Mar 23 03:48:22 PDT 2020

dmgreen added inline comments.

================
Comment at: llvm/lib/Transforms/Utils/LoopUnrollAndJam.cpp:914
   // Make sure we can move all instructions we need to before the subloop
+  BasicBlock *Header = L->getHeader();
+  BasicBlock *Latch = L->getLoopLatch();
----------------
Whitney wrote:
> dmgreen wrote:
> > Whitney wrote:
> > > dmgreen wrote:
> > > > Whitney wrote:
> > > > > dmgreen wrote:
> > > > > > Whitney wrote:
> > > > > > > dmgreen wrote:
> > > > > > > > Should this have a check for a 2 deep loop nest at the moment (like before), if the remainder of the analysis/transform code hasn't been updated yet? It looks like the count calculation might just exclude anything with multiple subloop blocks at the moment anyway, so is possibly not a problem in practice, without pragma's.
> > > > > > > Not sure I understand, this check is already like before. 
> > > > > > Sorry, the actual line I was pointing to was semi-random. That wasn't clear.
> > > > > > 
> > > > > > Does the processHeaderPhiOperands check below need to check each level? IIRC it's testing data-dependencies (as in ssa/use-def dependencies, as opposed to the memory dependencies in checkDependencies. Can we physically move any instruction we need from aft to fore). If we end up moving multiple levels past one another, do we have to make the same checks at each level?
> > > > > > 
> > > > > > My general point was that some of the code still only handles 2-deep loop nests. Should we have a check somewhere (perhaps with a fixme next it) that still tests for that condition, until the rest of the code has caught up?
> > > > > ```
> > > > >     Because of the way we rearrange basic blocks, we also require that
> > > > >     the Fore blocks of L on all unrolled iterations are safe to move before the
> > > > >     blocks of the direct child of L of all iterations. So we require that the
> > > > >     phi node looping operands of ForeHeader can be moved to at least the end of
> > > > >     ForeEnd, so that we can arrange cloned Fore Blocks before the subloop and
> > > > >     match up Phi's correctly.
> > > > > ```
> > > > > As we are only unrolling L, not its child, we don't need to move instructions from non-L AftBlock to non-L ForeBlock, so we don't need to check if the moves are safe. 
> > > > > 
> > > > > > My general point was that some of the code still only handles 2-deep loop nests. Should we have a check somewhere (perhaps with a fixme next it) that still tests for that condition, until the rest of the code has caught up?
> > > > > I modified all checks I found needed in `isSafeToUnrollAndJam`, am I missing something?
> > > > In the summary you have B' moving past D. And we need to be sure that B' doesn't depend on anything from D.
> > > > 
> > > > I think of it as B(1,0) needs to move past D(0,0). the "j" level loop isn't unrolled, but there still some movement needed at the "i" level.
> > > Here we are talking about def-use dependence. 
> > > If an instruction in B' (x2) depend on an instruction in D (y),
> > > means there must be an instruction in B (x) that depend on instruction y in D,
> > > as B' is clone from B.
> > > ```
> > > B:
> > >   x = phi [y, D]...
> > > D:
> > >   y =
> > > ```
> > > As we are placing B' after B, and y is available for B, then y must also be available for B'.
> > > 
> > > Please correct me if I am wrong.
> > Yeah, kind of. But with an extra level of unrolling in there. Using your ` notation from before, normal unrolling would do:
> > ```
> > B:
> >   x = phi [y', D'], [x, A]
> > D:
> >   y =
> > B':
> >   x' = phi [y, D]
> > D:
> >   y' =
> > ```
> > As we need to move B' before D, we also need to be able to hoist y into B, so the phi in B' can point at the correct value. That's what this processHeaderPhiOperands is doing. Note the x` phi only has one operand after unrolling, so will be simplified away.
> > 
> > It comes up quite a bit from the increment of the IV variable. The second IV will become an increment of the first after we have pushed the add up into the header from the latch.
> We are still only unrolling one loop each time, but fuse LoopDepth-1 times. 
> Instructions in B should be the same as B', except all `i` changed to `i+1`, when unrolling by 2. 
> Likewise, all instructions in D should be the same as D', except all `i` changed to `i+1`, when unrolling by 2. 
> In that case, how can an instruction defined in D' used in B?
> I think processHeaderPhiOperands is used for the induction variable of the unrolled loop. 
Ah I see. Even though we are moving the blocks past one another, at that level there will not be any instructions that needs to move. OK fair enough :)

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D76132/new/

https://reviews.llvm.org/D76132