[llvm] [AMDGPU] Simplify and improve codegen for llvm.amdgcn.set.inactive (PR #107889)
Jay Foad via llvm-commits
llvm-commits at lists.llvm.org
Mon Sep 9 09:43:38 PDT 2024
================
@@ -145,15 +137,16 @@ define amdgpu_kernel void @set_inactive_scc(ptr addrspace(1) %out, i32 %in, <4 x
define amdgpu_kernel void @set_inactive_f32(ptr addrspace(1) %out, float %in) {
; GCN-LABEL: set_inactive_f32:
; GCN: ; %bb.0:
-; GCN-NEXT: s_load_dword s6, s[2:3], 0x2c
+; GCN-NEXT: s_load_dword s4, s[2:3], 0x2c
; GCN-NEXT: s_load_dwordx2 s[0:1], s[2:3], 0x24
-; GCN-NEXT: s_or_saveexec_b64 s[4:5], -1
+; GCN-NEXT: s_or_saveexec_b64 s[2:3], -1
; GCN-NEXT: v_mov_b32_e32 v0, 0x40400000
+; GCN-NEXT: s_mov_b64 exec, s[2:3]
; GCN-NEXT: s_mov_b32 s2, -1
-; GCN-NEXT: s_mov_b64 exec, s[4:5]
; GCN-NEXT: s_waitcnt lgkmcnt(0)
-; GCN-NEXT: v_mov_b32_e32 v0, s6
-; GCN-NEXT: s_mov_b64 exec, -1
+; GCN-NEXT: v_mov_b32_e32 v1, s4
----------------
jayfoad wrote:
This v_mov_b32 is an extra instruction due to constant bus restrictions in v_cndmask_b32 but these go away in GFX10+, and it is a pretty minor penalty anyway.
https://github.com/llvm/llvm-project/pull/107889
More information about the llvm-commits
mailing list