[PATCH] D75233: [LoopTerminology] LCSSA Form

Mon Mar 2 13:19:18 PST 2020

baziotis marked 3 inline comments as done.
baziotis added inline comments.

================
Comment at: llvm/docs/LoopTerminology.rst:181-184
+      if (c)
+        X1 = ...
+      else
+        X2 = ...
----------------
Meinersbur wrote:
> Maybe remove the if-statement here, it is specific to loop unswitching
Yes, I'll give a simple example, then an example specific to loop unswitching.

================
Comment at: llvm/docs/LoopTerminology.rst:211-212
+This form is ensured by the LCSSA (:ref:`-loop-simplify <passes-lcssa>`)
+pass and is added automatically by the LoopPassManager when
+scheduling a LoopPass.
+After the loop optimizations are done, these extra phi nodes
----------------
Meinersbur wrote:
> baziotis wrote:
> > Meinersbur wrote:
> > > Note that this applies to the new pass manager only (by `FunctionToLoopPassAdaptor`): This does not happen with the legacy (current default) pass manager.
> > Ok, I'm not very familiar with pass managers. So, in the old pass manager, you have to invoke LCSSA automatically or something?
> In the legacy pass manger, those passes are added as dependency of the loop passes. The Helper function [[ https://github.com/llvm/llvm-project/blob/master/llvm/lib/Transforms/Utils/LoopUtils.cpp#L141 | `getLoopAnalysisUsage` ]] add them  them as requirements and should be called by all loop passes in `getAnalysisUsage`. Might be useful to mention that.
Ok, thanks, I can't really find much info about the pass managers. I'll add that.

================
Comment at: llvm/docs/LoopTerminology.rst:248-251
+If we did not have Loop Closed SSA form, we would have had to
+do deep analysis of the control flow graph to figure out where
+to place the X4 phi node.  With it, we just loop for these
+PHIs and update them.
----------------
Meinersbur wrote:
> baziotis wrote:
> > Meinersbur wrote:
> > > baziotis wrote:
> > > > Meinersbur wrote:
> > > > > For the placement of the PHINode, LCSSA needs to do the deep analysis (finding the dominating exit block) as well. LCSSA makes it a two-step process. I don't think it would be complicated to do it in a single step on-demand when unswitching. There are other advantages.
> > > > > 
> > > > > LLVM uses a linked linked list to list all the uses of an instruction. There is a convenience function `replaceAllUsesWith` (or short: RAUW) does this replacement. For loops however, it had to distinguish between inside/outside users (1), but this is no different that when versioning a non-loop BasicBlock. For the placement of the PHINode, LCSSA needs to do the deep analysis (finding the exit block) as well.
> > > > > 
> > > > > I think the primary reason is that loop analysis become more independent (3). Say we perform a loop analysis on all loops in a function that stores reference to `llvm::Value`s used in the loops, such as ScalarEvolution. In a second step, we transform all loops using the analysis' result which does RAUW. Without LCSSA it might replace values in other loops and invalidate the analysis already applied to the. With LCSSA, only value of the PHINode in the exit block is changed, but the PHINode is the same instance used in other loops. However, if we have loop transformations that transform inner as well as outer loops, we still need to handle the case that transforming the inner (respectively outer) loop may invalidate an analysis on the other.
> > > > > 
> > > > > Another reason is that with LCSSA, `ScalarEvolution::getSCEV` is sufficient (2). Otherwise, one needs to use `getSCEVAtScope` to specify whether we are using the SCEV inside or outside. Polly does not require LCSSA, hence uses mostly `getSCEVAtScope`.
> > > > > 
> > > > > GCC has documentation about the advantages of LCSSA: https://gcc.gnu.org/onlinedocs/gccint/LCSSA.html . The numbers (1-3) correspond to the bullet list.
> > > > Thank you for the explanation! Some questions / comments:
> > > > 
> > > > > For the placement of the PHINode, LCSSA needs to do the deep analysis (finding the dominating exit block) as well. LCSSA makes it a two-step process. I don't think it would be complicated to do it in a single step on-demand when unswitching. There are other advantages.
> > > > 
> > > > That was actually [[ http://nondot.org/sabre/LLVMNotes/LoopOptimizerNotes.txt | a note from Chris Lattner ]] (I couldn't have thought that haha). What I understood isn't that we can avoid deep analysis completely. Rather, deep analysis will be done for LCSSA. But then, loop optimizations have to preserve LCSSA (which from what I understand should be relatively easy), so when the time comes to do loop unswitching, LCSSA is there and helps you do it quick.
> > > > 
> > > > > There is a convenience function replaceAllUsesWith (or short: RAUW) does this replacement.
> > > > 
> > > > "this replacement" being loop unswitching ?
> > > > 
> > > > >  For loops however, it had to distinguish between inside/outside users (1), but this is no different that when versioning a non-loop BasicBlock
> > > > 
> > > > Why exactly the distinction ? What I understand is that, when doing RAUW on a basic block or a bunch of basic blocks (or just one), no matter if they're a loop or not, if you know which values are live outside them (e.g. using single entry PHI nodes), you make your life easier. Because you replace the inside uses and those on the PHI nodes on the boundary. Compared to if you didn't have those,
> > > > where you would have to replace the uses across the whole function.
> > > > 
> > > > > Without LCSSA it might replace values in other loops and invalidate the analysis already applied to the. With LCSSA, only value of the PHINode in the exit block is changed, but the PHINode is the same instance used in other loops
> > > > 
> > > > Wow, that was very smart.
> > > > 
> > > > > Another reason is that with LCSSA, ScalarEvolution::getSCEV is sufficient (2). Otherwise, one needs to use getSCEVAtScope to specify whether we are using the SCEV inside or outside.
> > > > 
> > > > I'm not sure I got that completely (I'm not familiar with the interface of SCEV in LLVM). I guess because no instruction of the loop is used outside of it, so SCEV for a loop can be limited to the loop. Otherwise, a value may be used wherever and you have to include it in the scope if you are to do SCEV correctly.
> > > > 
> > > Since I haven't implemented a loop unswitch pass myself, I can't really argue how difficult it it, so I don't request to change it. 
> > > 
> > > Interestingly, the [[ https://gcc.gnu.org/ml/gcc-patches/2004-03/msg02212.html | source referenced ]] by the note just mentions it as "non-trivial update". It also adds another advantage of LCSSA: Induction variable uses in nested loop. Although SCEV should have no problem with, it's just two AddRecExpr in an SECExpr.
> > > 
> > > Maybe you could add the other advantages as well?
> > > 
> > > > "this replacement" being loop unswitching ?
> > > 
> > > Replacing uses of an instruction result outside the loop by the PHINode that 'combines' the result of the unswitched loops.
> > > 
> > > > Why exactly the distinction ? 
> > > 
> > > It's surely fewer values to replace (one in the PHI changes all uses outside the loop).  Whether a value is used outside can be checked using `L->contains(Use)` IMHO this does not justify LCSSA which searches and replaces all outside uses with the PHI unconditionally.
> > > 
> > > > I'm not sure I got that completely (I'm not familiar with the interface of SCEV in LLVM).
> > > 
> > > The SCEV for inside the loop references the induction variable in an AddRec expression. This is only valid to expand inside the loop. When outside the loop, you need the expression representing the exit value of the loop, or SCEVUnkown when the exit value cannot be computed, e.g. because the loop exit condition is not known.
> > > Interestingly, the source referenced by the note just mentions it as "non-trivial update". It also adds another advantage of LCSSA: Induction variable uses in nested loop. Although SCEV should have no problem with, it's just two AddRecExpr in an SECExpr.
> > 
> > I hadn't seen that, thanks. What is a `SECExpr`?
> > 
> > > Maybe you could add the other advantages as well?
> > Yes. You mean just the one I guess, about the induction variables.
> > 
> > > Replacing uses of an instruction result outside the loop by the PHINode that 'combines' the result of the unswitched loops.
> > 
> > Oh, I got what you mean. Even if you have LCSSA before the unswitching, //after// it, you still have to go and update all the uses
> > with the new PHINode that combines the 2 PHIs of the versioned loops.
> > 
> > > Whether a value is used outside can be checked using `L->contains(Use)`
> > 
> > Are you sure? `L->contains()` I think doesn't get `Use`. Given a `Value`, it seems you have to loop through all the uses, get the User and check if this is inside the loop. Something like what [[ https://llvm.org/doxygen/SimpleLoopUnswitch_8cpp_source.html#l00141 | replaceLoopInvariantUses() ]] does.
> > 
> > > IMHO this does not justify LCSSA which searches and replaces all outside uses with the PHI unconditionally.
> > I agree.
> > 
> > > The SCEV for inside the loop references...
> > Got it, thanks!
> > What is a SECExpr?
> 
> `ScalarEvolution::SCEVExpr`. Sorry for the typo.
> 
> > Yes. You mean just the one I guess, about the induction variables.
> 
> Why not all of them?
> 
> > Even if you have LCSSA before the unswitching, after it, you still have to go and update all the uses
> with the new PHINode that combines the 2 PHIs of the versioned loops.
> 
> SimpleLoopUnswitch does as you described: https://github.com/llvm/llvm-project/blob/master/llvm/lib/Transforms/Scalar/SimpleLoopUnswitch.cpp#L255
> 
> ```
>   for (...) {
>     %val = ...
>   } 
>   %lcssa = phi [%val]
> ```
> to 
> ```
>   if (c) {
>     for (...) {
>       %val1 = ...
>     }
>     %newlcssa1 = phi [%val1] // new PHI
>   } else {
>     for (...) {
>       %val2 = ...
>     }
>     %newlcssa2 = phi [%val2] // new PHI
>   }
>   %lcssa = phi [%newlcssa1, %newlcssa2] // re-use old LCSSA node  here, not an LCSSA node any more
> ```
> 
> 
> > Are you sure? L->contains() I think doesn't get Use. 
> 
> You'd use it on the instruction that uses the value.
> 
> > Given a Value, it seems you have to loop through all the uses, get the User and check if this is inside the loop. Something like what replaceLoopInvariantUses() does.
> 
> It is not that much overhead to use L->contains on each use, at least less than maintaining LCSSA. `replaceLoopInvariantUses()` uses it to only change something for users inside the loop, i.e. ignore LCSSA PHIs. You can do the same just to find outside users.
> 
> Why not all of them?

[[ https://gcc.gnu.org/ml/gcc-patches/2004-03/msg02212.html | Here ]], Zdenek Dvorak mentions 2 advantages I think:
```
1) Updating it during unrolling/peeling/versioning is trivial, since
     we do not need to care about the uses outside of the loop.
2) About induction variables
```
I think I have already mentioned 1) with the example of loop unswitching, no?

You may mean [[ https://gcc.gnu.org/onlinedocs/gccint/LCSSA.html | the other GCC page for LCSSA ]]. Basically, the first 2 there are the same. The third I think is not an advantage. If I understood correctly, it basically says that even with LCSSA, we have to go and do the RAUW for the new phi node.

> with the new PHINode that combines the 2 PHIs of the versioned loops.
SimpleLoopUnswitch does as you described:

Great, now it's clear!

> It is not that much overhead to use L->contains on each use, at least less than maintaining LCSSA.

Ok, got it.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D75233/new/

https://reviews.llvm.org/D75233