[clang] [llvm] [AMDGPU] Change CF intrinsics lowering to reconverge on predecessors (PR #108596)

Mon Sep 16 10:23:30 PDT 2024

alex-t wrote:

> I agree with Jay that this change isn't acceptable from a codegen quality point of view.
> 
> > Then it appeared that instructions loading the values spilled to the memory and used in the current block must be placed at the block beginning before they are used but after the point where EXEC mask is restored, since they are loading VGPRs and, hence, read EXEC. Hence, we had to consider all spilling opcodes to belong to the block prologue.
> 
> That does not seem logical. The restores themselves only read EXEC, they do not write it. So why should they be part of the prolog?
> 
> It seems the whole problem could be avoided by not considering VGPR restores to be part of the block prolog?

Let me describe the complete case. We have register allocation split in 2 steps - the first run allocated SGPRs and the second one takes care of VGPRs. It might happen that during the SGPRs allocation we had to spill the register containing EXEC mask value to the memory. Hence, we have to load it back at the beginning of the flow block BEFORE the OR-ing with current EXEC, as it is a live-in value that is an operand for the OR. As soon as we inserted the reload the prologue sequence is broken and during the VGPRs allocation any spill/reload will be placed before the point at which the exec mask is restored.

The issue was addressed here: https://github.com/llvm/llvm-project/pull/69924 (also here https://reviews.llvm.org/D145329).
Finally, the decision was made to consider all spill/reload opcodes as belonging to the block prologue.

Although, revisiting this now, I still don't understand why they decided to include ALL spill opcodes in the prologue, but not only the SGPR spills? Clearly, none of the VGPR reloads really belong to the prologue.

At a first glance, changing the isSpill(opcode) to isSGPRSpill(opcode) in the snippet below would solve the initial case.
`  return IsNullOrVectorRegister &&
         (isSpill(Opcode) || (!MI.isTerminator() && Opcode != AMDGPU::COPY &&
                              MI.modifiesRegister(AMDGPU::EXEC, &RI)));
}`

I need to look at this a bit more. I am sure they would have done this if such a simple change had solved the problem.

https://github.com/llvm/llvm-project/pull/108596