[PATCH] D122200: [AMDGPU] Split waterfall loop exec manipulation

Wed Mar 23 18:14:35 PDT 2022

critson added a comment.

In D122200#3402550 <https://reviews.llvm.org/D122200#3402550>, @arsenm wrote:

> In D122200#3401933 <https://reviews.llvm.org/D122200#3401933>, @ruiling wrote:
>
>> I agree this is a good idea and the change looks good, just only a few minor comments. btw, Is this the only case that we modify EXEC in middle of a block?
>
> I don't think so. WWM/WQM still ends up in the middle of a block I believe. Kills also start in the middle of blocks, but are moved to terminators

WWM/WQM is probably the only remaining major example -- which I can work on fixing.
Kills and demotes always get lowered such that manipulation occurs at end of a block.

At the moment, I feel that pseudos that introduce exec manipulation do not themselves need to be treated as exec manipulation, in terms of not being in the middle of a block.
The burden is on the code lowering them.
Of course we have to take care about reordering around these instructions, but generally convergent attributes take care of this.

================
Comment at: llvm/lib/Target/AMDGPU/SIOptimizeVGPRLiveRange.cpp:337-338
+  // Collect loop instructions, potentially spanning multiple blocks
+  for (auto MBBI = LoopHeader->getIterator(),
+            MBBE = std::next(LoopEnd->getIterator());
+       MBBI != MBBE; ++MBBI) {
----------------
ruiling wrote:
> critson wrote:
> > arsenm wrote:
> > > I've thought about adding some kind of iterator which would automatically cross fallthrough blocks
> > I can certainly see it being useful in a few places.
> I would suggest you simply push LoopHeader and LoopEnd into the Blocks here, otherwise we have to go through the blocks by checking against predecessors/successors. The block appears between two blocks in the block-list may not appear between them in the CFG.
> You can also place an assert the LoopHeader has unique successor, which should be LoopEnd.
I have rewritten the loop to follow successors and validate there is a linear flow.
I also lifted the Block collection from the inner loop.
I don't want to limit this to just LoopHeader and LoopEnd, in case we extend the waterfall loop in future.

================
Comment at: llvm/lib/Target/AMDGPU/SIOptimizeVGPRLiveRange.cpp:570
   // after the loop. So we know that the old reg is not live throughout the
   // whole block anymore.
+  for (auto *Block : Blocks)
----------------
ruiling wrote:
> Please help update the comment a little bit to reflect the new code. It is better to place an assert that Blocks.size() <=2 here. The reason is if we have more blocks, we can not clear the AliveBlocks bit in this simple way.
I have updated the comment.
Sorry, I still don't understand the Block.size() limit -- surely this works for any length waterfall loop if all the blocks are chained together linearly? 

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D122200/new/

https://reviews.llvm.org/D122200