[llvm] [AMDGPU] Improve isBasicBlockPrologue to only add necessary instructions (PR #113303)

Sun Nov 3 23:33:01 PST 2024

ruiling wrote:

>  An instruction belongs to the prologue if it is part of the def-use chain that ends up on the exec mask writing instruction.

Is this correct? like in the case below, the second sgpr-reload-from-vgpr is not in the prologue?
v_readlane s0, v0, 1
v_readlane s1, v0, 2
s_or_b32 exec, exec, s0

>     1. We consider that the lexical order of the instruction in the prologue is implied by the def-use relations. That allows us to avoid checking def-use relations in the prologue sequence.

I don't think the def-use relationship is important when checking whether an instruction belong to the prologue. The prologue concept is mainly used to find right insertion point for vector instructions (anything else?). For scalar instructions that produce the input to the exec mask setup, yes, we have to put them inside the prologue for correctness. For other scalar instructions, we can either put them in the prologue or after. Right?

>     2. The prologue sequence includes all instructions from the basic block beginning to the first instruction modifying EXEC.

Yes

>     1. MachineBasicBlock:   SkipPHIsAndLabels and SkipPHIsLabelsAndDebug use isBasicBlockPrologue iterating over the instruction list. To change those call sites we'd need to rely on the order of instructions: PHIs first, then Prologue Sequence, Debug, Labels, PseudoOps, and whatever else. Nevertheless, this order is not guaranteed and might differ like PHIs, Lables, PseudoOps, Debug, Prologue, or any permutation of the latter. This complicates things even more.

Thank you for taking further look into this. This is indeed an issue we need to solve properly. I am not sure whether we have certain order for these instructions, seems that we cannot assume too much (but we can still assume PHIs always come first)? So, to get correct implementation of `skipBlockPrologue`, it feels reasonable to pass additional boolean arguments like `skipLabel`, `skipDebug` to the `skipBlockPrologue` to tell it skipping these instructions as well.

But its unclear to me what's the expected behavior like in corner cases. Like in skipPHIAndLables(), we would want to find the insertion point before debug instructions but after prologue instruction. so what if the IR have prologue instructions after debug instructions? What would the IR for such case look like?

https://github.com/llvm/llvm-project/pull/113303