[llvm] [AMDGPU] Allow hoisting of V_READFIRSTLANE_B32 for uniform operand (PR #178312)

Wed Feb 4 23:33:08 PST 2026

rampitec wrote:

> > Basically nobody objects the transformation itself. But I hear different opinions on how it shall be done:
> 
> That looks like a good starting point, and it seems that's what @jayfoad has been pressing. He even changed the title of the PR to reflect it.

Well, I have changed it when I have removed the pseudo on Matt's request. And that's the point: you want the pseudo back and intrinsic on top of it. Because when you say it can be done earlier than selection it means it needs a new intrinsic.

> So if it's a very specific transformation that we want, why not just implement it as a pass? The definition would be "hoist readfirstlane up to the point where the current input is still the immediate dominator, if the current input is uniform". This doesn't mess around with convergent semantics at all. Does the hoisting need to happen at the time of selection? Can it be done earlier in LLVM IR, or at any point where MIR is still in SSA form?

I actually do not see why this shall be limited to hoisting only. I'd say this can be sinked as well. If a superset of lanes was uniformely defined at dominator, it is equally defined at any narrower scope except for exec = 0.

> > I think we first need to agree. I personally prefer this very limited and small patch with the limited damage.
> 
> How about looking for an implementation with _no_ damage? Setting `NoConvergent` means that this instruction is no longer dependent on control flow, and it can be moved _anywhere_. It is not conditional on whether the inputs are uniform. That counts as "damage".

This is my understanding: if the input is uniform, it can be moved anywhere. But giving user an intrinsic is another level, it gives user a way to _assert_ uniformness. Not a bad thing, but is a separate thing. Here we have proven it. If you give an intrinsic, that is a user control.

> Even with a hypothetical new readanylane, it can only be moved up to the point where the current input is uniform. Existing optimizations will have to be taught about this, without which, having this operation is dangerous. "Safe by default" would mean that we make it convergent, and that defeats the purpose of having new intrinsics.
> 
> So I believe the best approach is to define a new peephole optimization on readfirstlane, and reuse it over and over again wherever it makes sense.
> 
> Recent activity in inst-combining convergent operations with uniform operands is tangentially interesting, btw ... #166955 and #116953

https://github.com/llvm/llvm-project/pull/178312