[PATCH] D111611: [NFC] [LoopPeel] Change the way DT is updated for loop exits

Thu Oct 14 22:24:35 PDT 2021

mkazantsev added inline comments.

================
Comment at: llvm/lib/Transforms/Utils/LoopPeel.cpp:598
+      DTU.applyUpdates(
+          {{DominatorTree::Insert, cast<BasicBlock>(VMap[IDom]), NewBB}});
     }
----------------
nikic wrote:
> This code doesn't make sense to me. `DT->addNewBlock()` directly adds a new edge to the DT. `DTU.applyUpdates()` applies CFG updates to the DT. You seem to be passing a DT update to a CFG API here. Maybe that happens to work, but this is not how this is supposed to be used.
> 
> I'm not sure using DTU in this code is really useful.
Let me try to explain. We are going to expand the logic of exit detection on D110922. The current implementation relies on fact that we only need to update idoms of loop blocks and their immediate exits to get the right tree. However, this assumption will break once we handle exit blocks that may have successors, meaning idom for those successors may change. Imagine the case:
```
loop:
  br loop2, exit1

loop2:
  br loop3, exit 2

loop3:
  br loop, exit 3

exit1:
  br post_exit

exit2:
  br post_exit

exit3:
  br post_exit
```

After all it is done, idom of post_exit will change. Meaning that we should not just change idoms of exits, but also their successors. Now this situation is impossible, but it becomes possible as the scope of the optimization expands. We want to make current implementation independent of the scope (in fact, this patch is just a split-out piece of that one). That's why we need to use insert updates in the caller method.

Now, lazy strategy (at least in theory) should require less tree traversal than eager one. That's why we use lazy DTU in the caller.

Finally, we use DTU in the outer method and we have this `cloneLoopBlocks` here in that, I agree, we could leave it be with DT. However, the code written as is both reads and writes DT. What is the interaction between this and pending lazy updates, isn't very clear. I guess it just happens to be OK, but since we are to use DTU anyways, why not just use it everywhere for uniformity reasons? Just to avoid the situation that part of updates are applied, part is pending, and we also get idom from this half-updated tree. It should *happen* to work, but it doesn't really look nearly safe.

So the prime motivation for this was "ok, we are going to use lazy DTU anyways, let's be consistent and get all updates through it". It means that `cloneLoopBlocks` should be deprived of direct DT access, to avoid temptation to read something from it, knowing that not all updates are applied. This is why no `DT` is here, and there is `DTU` in its stead. And now, DTU simply doesn't have `addNewBlock`.

Does it all make more sense now?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D111611/new/

https://reviews.llvm.org/D111611