[llvm] WIP: [AMDGPU] Use s_cselect_b32 for uniform select of f32 values (PR #111688)
Jay Foad via llvm-commits
llvm-commits at lists.llvm.org
Wed Oct 9 07:40:30 PDT 2024
================
@@ -9,7 +9,9 @@ define amdgpu_ps float @xor3_i1_const(float inreg %arg1, i32 inreg %arg2) {
; GCN-NEXT: v_cmp_lt_f32_e64 s[2:3], s0, 0
; GCN-NEXT: v_cmp_lt_f32_e32 vcc, s0, v0
; GCN-NEXT: s_and_b64 s[0:1], s[2:3], vcc
-; GCN-NEXT: v_cndmask_b32_e64 v0, 1.0, 0, s[0:1]
+; GCN-NEXT: s_and_b64 s[0:1], s[0:1], exec
+; GCN-NEXT: s_cselect_b32 s0, 0, 1.0
+; GCN-NEXT: v_mov_b32_e32 v0, s0
----------------
jayfoad wrote:
@alex-t codegen is worse here, because the condition came from a VALU compare _and_ the result of the select needs to be in a VGPR. Before SIFixSGPRCopies the MIR would look something like:
```
%2:sreg_64 = nofpexcept V_CMP_EQ_F32_e64 0, %0:sgpr_32, 0, %1:sgpr_32, 0, implicit $mode, implicit $exec
%3:sgpr_32 = S_MOV_B32 1084227584
%4:sgpr_32 = S_MOV_B32 1056964608
$scc = COPY %2:sreg_64 // <== copy from vector condition code
%5:sgpr_32 = S_CSELECT_B32 killed %4:sgpr_32, killed %3:sgpr_32, implicit $scc
$vgpr0 = COPY %5:sgpr_32 // <== copy result back to vgpr
```
After SIFixSGPRCopies it looks like:
```
%2:sreg_64 = nofpexcept V_CMP_EQ_F32_e64 0, %0:sgpr_32, 0, %1:sgpr_32, 0, implicit $mode, implicit $exec
%3:sgpr_32 = S_MOV_B32 1084227584
%4:sgpr_32 = S_MOV_B32 1056964608
%6:sreg_64 = S_AND_B64 %2:sreg_64, $exec, implicit-def $scc
%5:sgpr_32 = S_CSELECT_B32 killed %4:sgpr_32, killed %3:sgpr_32, implicit $scc
$vgpr0 = COPY %5:sgpr_32
```
Do you think SIFixSGPRCopies could detect this case, and instead convert S_CSELECT_B32 into V_CNDMASK_B32?
https://github.com/llvm/llvm-project/pull/111688
More information about the llvm-commits
mailing list