[llvm] Rework i1->i32 zext/anyext translation (PR #114721)

Mon Nov 4 01:59:59 PST 2024

================
@@ -740,10 +740,12 @@ define amdgpu_kernel void @fp_to_uint_f32_to_i1(ptr addrspace(1) %out, float %in
 ; SI-NEXT:    s_load_dword s4, s[2:3], 0xb
 ; SI-NEXT:    s_load_dwordx2 s[0:1], s[2:3], 0x9
 ; SI-NEXT:    s_mov_b32 s3, 0xf000
-; SI-NEXT:    s_mov_b32 s2, -1
 ; SI-NEXT:    s_waitcnt lgkmcnt(0)
 ; SI-NEXT:    v_cmp_eq_f32_e64 s[4:5], -1.0, s4
-; SI-NEXT:    v_cndmask_b32_e64 v0, 0, 1, s[4:5]
+; SI-NEXT:    s_and_b64 s[4:5], s[4:5], exec
+; SI-NEXT:    s_cselect_b32 s4, 1, 0
+; SI-NEXT:    s_mov_b32 s2, -1
+; SI-NEXT:    v_mov_b32_e32 v0, s4
----------------
jayfoad wrote:

Regression here. This looks like a case where the result (s4) is uniform, but it will be needed in a vgpr (v0) anyway, so we might as well use v_cndmask in the first place. Maybe #113705 would help with this.

https://github.com/llvm/llvm-project/pull/114721