[llvm] [AMDGPU] Allow hoisting of V_READFIRSTLANE_B32 for uniform operand (PR #178312)
Jay Foad via llvm-commits
llvm-commits at lists.llvm.org
Tue Feb 10 01:54:33 PST 2026
jayfoad wrote:
> This is safe, so far I have not seen a counter example.
Here's some example IR:
```
define amdgpu_ps i32 @f(i32 inreg %uni, i32 %div) {
entry:
%cond = icmp eq i32 %uni, %div
br i1 %cond, label %body, label %ret
body:
%rfl = call i32 @llvm.amdgcn.readfirstlane(i32 %uni)
ret i32 %rfl
ret:
ret i32 0
```
After finalize-isel (your patch just adds the `noconvergent`):
```
bb.0.entry:
successors: %bb.1, %bb.2
liveins: $sgpr0, $vgpr0
%4:vgpr_32 = COPY $vgpr0
%3:sgpr_32 = COPY $sgpr0
%6:sreg_32 = V_CMP_EQ_U32_e64 %3, %4, implicit $exec
%5:vgpr_32 = V_MOV_B32_e32 0, implicit $exec
%0:sreg_32 = SI_IF killed %6, %bb.2, implicit-def dead $exec, implicit-def dead $scc, implicit $exec
S_BRANCH %bb.1
bb.1.body:
%8:vgpr_32 = COPY %3
%7:sreg_32_xm0 = noconvergent V_READFIRSTLANE_B32 %8, implicit $exec
%1:sreg_32 = COPY %7
%10:vgpr_32 = COPY %1, implicit $exec
bb.2.UnifiedReturnBlock:
%2:vgpr_32 = PHI %5, %bb.0, %10, %bb.1
SI_END_CF %0, implicit-def dead $exec, implicit-def dead $scc, implicit $exec
%9:sreg_32_xm0 = V_READFIRSTLANE_B32 %2, implicit $exec
$sgpr0 = COPY %9
SI_RETURN_TO_EPILOG $sgpr0
```
Now you have to imagine a MIR transformation pass that replaces all uses of %3 with %4 in bb.1.body. Such a transformation probably does not exist today, but I don't see any reason why it should not be allowed. After that there is nothing to stop these two instructions being hoisted out of bb.1.body:
```
%8:vgpr_32 = COPY %4
%7:sreg_32_xm0 = noconvergent V_READFIRSTLANE_B32 %8, implicit $exec
```
But then the program will be broken because the readfirstlane can read from a different lane of %4 (which was not active inside the body).
Is this too hypothetical?
https://github.com/llvm/llvm-project/pull/178312
More information about the llvm-commits
mailing list