[llvm] Rework i1->i32 zext/anyext translation (PR #114721)
Jay Foad via llvm-commits
llvm-commits at lists.llvm.org
Mon Nov 4 01:59:59 PST 2024
================
@@ -740,10 +740,12 @@ define amdgpu_kernel void @fp_to_uint_f32_to_i1(ptr addrspace(1) %out, float %in
; SI-NEXT: s_load_dword s4, s[2:3], 0xb
; SI-NEXT: s_load_dwordx2 s[0:1], s[2:3], 0x9
; SI-NEXT: s_mov_b32 s3, 0xf000
-; SI-NEXT: s_mov_b32 s2, -1
; SI-NEXT: s_waitcnt lgkmcnt(0)
; SI-NEXT: v_cmp_eq_f32_e64 s[4:5], -1.0, s4
-; SI-NEXT: v_cndmask_b32_e64 v0, 0, 1, s[4:5]
+; SI-NEXT: s_and_b64 s[4:5], s[4:5], exec
+; SI-NEXT: s_cselect_b32 s4, 1, 0
+; SI-NEXT: s_mov_b32 s2, -1
+; SI-NEXT: v_mov_b32_e32 v0, s4
----------------
jayfoad wrote:
Regression here. This looks like a case where the result (s4) is uniform, but it will be needed in a vgpr (v0) anyway, so we might as well use v_cndmask in the first place. Maybe #113705 would help with this.
https://github.com/llvm/llvm-project/pull/114721
More information about the llvm-commits
mailing list