[PATCH] D17473: [LoopUnroll] Avoid unnecessary DT recomputation.

Fri Feb 19 21:20:23 PST 2016

mzolotukhin added inline comments.

================
Comment at: lib/Transforms/Utils/LoopUnroll.cpp:536
@@ +535,3 @@
+            ExitIDom ? DT->findNearestCommonDominator(ExitIDom, *BI) : *BI;
+      }
+      DT->changeImmediateDominator(Exit, ExitIDom);
----------------
joker.eph wrote:
> mzolotukhin wrote:
> > chandlerc wrote:
> > > mzolotukhin wrote:
> > > > The reason we're doing this is that dominator for an exit block could change after unrolling. Consider a diamond-like loop body with header `H`, side blocks  `A` and `B`, and latchblock  `L`. Suppose `B` is exiting to `E`.
> > > > 
> > > > If `B` is the immediate dominator of `E` before unrolling, it's not the case after unrolling - we'll have several blocks exiting to `E`, so we have to actually find their common dominator.
> > > > 
> > > > Probably, there is a more efficient way of doing this, but even in this form it's a pure win over what we have now.
> > > Ahh, I see.
> > > 
> > > So, the reason that this seemed odd to me is that all of these blocks that now branch to E come from unrolled copies of the loop, and so they should all have the same IDom -- the IDom of B from the first copy of the loop (which I think pretty much has to be the header, but I've not thought very hard about that).
> > > 
> > > But have we done any CFG simplification during unrolling at this point? (I know we talked about that, not sure any of it landed...) If so, that would of course potentially invalidate the idea of basing this purely on the loop structure and structural nature of unrolling.
> > > 
> > > It's not so much that this is ever going to be expensive at runtime (the domtree should make this pattern quite fast), it was just that I wanted to understand the complexity.
> > > 
> > > I think explaining some of the context of how to think about this code in comments would be very useful here.
> > > all of these blocks that now branch to E come from unrolled copies of the loop, and so they should all have the same IDom -- the IDom of B from the first copy of the loop
> > I don't think it's correct. IDom of B might be H, but doesn't have to be (you can imagine a diamond in diamond structure to prove it).
> > 
> > To explain it better I'll try to use ASCII mad skills here. Here is our (slightly modified) original loop body:
> > ```
> >   (H)
> >    |
> >    v
> >   (I)
> >   / \
> >  v   v
> > (A) (B) --> (E)
> >   \ /
> >    v
> >   (L)
> > ```
> > Here IDom(B) = I, IDom(E) = B.
> > 
> > After unrolling we'll have:
> > ```
> >   (H)
> >    |
> >    v
> >   (I)
> >   / \
> >  v   v
> > (A) (B) ------
> >   \ /         \
> >    v          |
> >   (L)         |
> >    |          |
> >    v          |
> >   (H')        |
> >    |          |
> >    v          |
> >   (I')        |
> >   /  \        |
> >  v    v       v
> > (A') (B') -> (E)
> >   \  /
> >    vv
> >   (L')
> > 
> > ```
> > In the unrolled loop IDom(B) = I, IDom(B') = I', IDom(E) = NearestCommonDominator(B', B) = I. Pleas note, that it doesn't have to be the header.
> > 
> > That said, I see what you meant by using structural nature of unrolling - we do exploit it when we assign dominators for cloned blocks.
> > 
> > As for the CFG simplification - we perform some folding right after this, in `FoldBlockIntoPredecessor`, which I also updated in this patch.
> > 
> It is not necessarily the header, but it is still `IDom(B)`, do you have an example where `IDom(E) != IDom(B)`? (Yeah, I love your asciiart-fu ;))
One example is that `IDom(E)` might be `B` itself. Also, if you construct an example with nested diamonds, you'll probably be able to get `IDom(E)` != `IDom(B)`.

But after thinking about it, I think `IDom(E)` = `NearestCommonDominator(B, H')`. It looks like obvious to me now, but it's too late on Friday to actually accurately prove it:)

http://reviews.llvm.org/D17473