[llvm] WIP: [AMDGPU] Use s_cselect_b32 for uniform select of f32 values (PR #111688)
via llvm-commits
llvm-commits at lists.llvm.org
Tue Oct 22 13:13:59 PDT 2024
================
@@ -9,7 +9,9 @@ define amdgpu_ps float @xor3_i1_const(float inreg %arg1, i32 inreg %arg2) {
; GCN-NEXT: v_cmp_lt_f32_e64 s[2:3], s0, 0
; GCN-NEXT: v_cmp_lt_f32_e32 vcc, s0, v0
; GCN-NEXT: s_and_b64 s[0:1], s[2:3], vcc
-; GCN-NEXT: v_cndmask_b32_e64 v0, 1.0, 0, s[0:1]
+; GCN-NEXT: s_and_b64 s[0:1], s[0:1], exec
+; GCN-NEXT: s_cselect_b32 s0, 0, 1.0
+; GCN-NEXT: v_mov_b32_e32 v0, s0
----------------
alex-t wrote:
I sketched out a change that does what we want but it does not take care of the general case. Although, it produces a good asm for your example :)
v_mov_b32_e32 v0, 0x42640000
v_cmp_lt_f32_e64 s[2:3], s0, 0
v_cmp_lt_f32_e32 vcc, s0, v0
s_and_b64 s[0:1], s[2:3], vcc
v_cndmask_b32_e64 v0, 1.0, 0, s[0:1]
I will create a PR for the change so that you can look at the code.
https://github.com/llvm/llvm-project/pull/111688
More information about the llvm-commits
mailing list