[llvm] [AMDGPU] Allow hoisting of V_READFIRSTLANE_B32 for uniform operand (PR #178312)

Wed Feb 4 20:50:28 PST 2026

ssahasra wrote:

> Basically nobody objects the transformation itself. But I hear different opinions on how it shall be done:

That looks like a good starting point, and it seems that's what @jayfoad has been pressing. He even changed the title of the PR to reflect it. So if it's a very specific transformation that we want, why not just implement it as a pass? The definition would be "hoist readfirstlane up to the point where the current input is still the immediate dominator, if the current input is uniform". This doesn't mess around with convergent semantics at all. Does the hoisting need to happen at the time of selection? Can it be done earlier in LLVM IR, or at any point where MIR is still in SSA form?

> I think we first need to agree. I personally prefer this very limited and small patch with the limited damage.

How about looking for an implementation with _no_ damage? Setting `NoConvergent` means that this instruction is no longer dependent on control flow, and it can be moved _anywhere_. It is not conditional on whether the inputs are uniform. That counts as "damage".

Even with a hypothetical new readanylane, it can only be moved up to the point where the current input is uniform. Existing optimizations will have to be taught about this, without which, having this operation is dangerous. "Safe by default" would mean that we make it convergent, and that defeats the purpose of having new intrinsics. 

So I believe the best approach is to define a new peephole optimization on readfirstlane, and reuse it over and over again wherever it makes sense.

Recent activity in inst-combining convergent operations with uniform operands is tangentially interesting, btw ... #166955 and #116953

https://github.com/llvm/llvm-project/pull/178312