[PATCH] D111611: [NFC] [LoopPeel] Change the way DT is updated for loop exits

Mon Oct 18 01:09:57 PDT 2021

nikic added inline comments.

================
Comment at: llvm/lib/Transforms/Utils/LoopPeel.cpp:598
+      DTU.applyUpdates(
+          {{DominatorTree::Insert, cast<BasicBlock>(VMap[IDom]), NewBB}});
     }
----------------
mkazantsev wrote:
> nikic wrote:
> > This code doesn't make sense to me. `DT->addNewBlock()` directly adds a new edge to the DT. `DTU.applyUpdates()` applies CFG updates to the DT. You seem to be passing a DT update to a CFG API here. Maybe that happens to work, but this is not how this is supposed to be used.
> > 
> > I'm not sure using DTU in this code is really useful.
> Let me try to explain. We are going to expand the logic of exit detection on D110922. The current implementation relies on fact that we only need to update idoms of loop blocks and their immediate exits to get the right tree. However, this assumption will break once we handle exit blocks that may have successors, meaning idom for those successors may change. Imagine the case:
> ```
> loop:
>   br loop2, exit1
> 
> loop2:
>   br loop3, exit 2
> 
> loop3:
>   br loop, exit 3
> 
> exit1:
>   br post_exit
> 
> exit2:
>   br post_exit
> 
> exit3:
>   br post_exit
> ```
> 
> After all it is done, idom of post_exit will change. Meaning that we should not just change idoms of exits, but also their successors. Now this situation is impossible, but it becomes possible as the scope of the optimization expands. We want to make current implementation independent of the scope (in fact, this patch is just a split-out piece of that one). That's why we need to use insert updates in the caller method.
> 
> Now, lazy strategy (at least in theory) should require less tree traversal than eager one. That's why we use lazy DTU in the caller.
> 
> Finally, we use DTU in the outer method and we have this `cloneLoopBlocks` here in that, I agree, we could leave it be with DT. However, the code written as is both reads and writes DT. What is the interaction between this and pending lazy updates, isn't very clear. I guess it just happens to be OK, but since we are to use DTU anyways, why not just use it everywhere for uniformity reasons? Just to avoid the situation that part of updates are applied, part is pending, and we also get idom from this half-updated tree. It should *happen* to work, but it doesn't really look nearly safe.
> 
> So the prime motivation for this was "ok, we are going to use lazy DTU anyways, let's be consistent and get all updates through it". It means that `cloneLoopBlocks` should be deprived of direct DT access, to avoid temptation to read something from it, knowing that not all updates are applied. This is why no `DT` is here, and there is `DTU` in its stead. And now, DTU simply doesn't have `addNewBlock`.
> 
> Does it all make more sense now?
I understand the motivation for this change -- my problem is that the code misuses the DomTreeUpdater API. There is (generally) no CFG edge between `VMap[IDom]` and `NewBB`, but you are pretending there is. Maybe this happens to work because of implementation details, but it's not the correct way to use the API.

You'd have to actually insert all new control flow edges for the cloned blocks, not just the idom edges in the DT. It's one of the cases where directly updating the DT is simpler than going through DTU.

FYI the corresponding code for unrolling is https://github.com/llvm/llvm-project/blob/3f0b178de21ee82791a6ebe198314f14c0287a44/llvm/lib/Transforms/Utils/LoopUnroll.cpp#L643-L658 -- probably the same code would work for peeling as well.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D111611/new/

https://reviews.llvm.org/D111611