[llvm] [AMDGPU] Improve isBasicBlockPrologue to only add necessary instructions (PR #113303)

Wed Oct 30 08:22:19 PDT 2024

ruiling wrote:

Thanks for doing this! But frankly speaking, this is making the implementation more confusing. I think most of the weirdness in the implementation is caused by the way the target hook is defined. Querying whether each instruction is block prologue seems not the right way to go.

The basic idea behind the block prologue is that we have a special instruction that setup the exec for the block, all the instructions before this specific instruction are prologue instructions. So the prologue searching process should work like forward-iterating the instructions until we see the first instruction that modifies exec, which would be the last prologue instruction. There are ways to speed up the searching process to avoid visiting all the instructions in the worst case. For example, we can check the instruction types that could possibly in the prologue. If we found an instruction that is **NOT** in the list of possible prologue instructions before we see an exec setup instruction, then we should return and report that there is no prologue instruction at all.

Following the idea, maybe we should define the hook as `skipBlockPrologue(MachineBasicBlock, beginIterator)` which would checking for prologue instruction from the `beginIterator`, and return the iterator right after the last prologue instruction (if there is no prologue instruction, just return the beginIterator). Then the implementation would be a simple forward iteration over the instructions with no recursion. With the returned iterator, I think the caller should be able to do whatever they want on the prologue instructions. Sounds reasonable?

https://github.com/llvm/llvm-project/pull/113303