[llvm] [AMDGPU] V_SET_INACTIVE optimizations (PR #98864)

Carl Ritson via llvm-commits llvm-commits at lists.llvm.org
Tue Sep 3 22:27:54 PDT 2024


================
@@ -816,12 +816,10 @@ define amdgpu_kernel void @global_atomic_fadd_uni_address_div_value_agent_scope_
 ; GFX9-DPP-NEXT:    v_mbcnt_hi_u32_b32 v1, exec_hi, v1
 ; GFX9-DPP-NEXT:    s_or_saveexec_b64 s[0:1], -1
 ; GFX9-DPP-NEXT:    v_bfrev_b32_e32 v3, 1
+; GFX9-DPP-NEXT:    v_bfrev_b32_e32 v4, 1
 ; GFX9-DPP-NEXT:    s_mov_b64 exec, s[0:1]
 ; GFX9-DPP-NEXT:    v_mov_b32_e32 v4, v0
-; GFX9-DPP-NEXT:    s_not_b64 exec, exec
-; GFX9-DPP-NEXT:    v_bfrev_b32_e32 v4, 1
-; GFX9-DPP-NEXT:    s_not_b64 exec, exec
-; GFX9-DPP-NEXT:    s_or_saveexec_b64 s[0:1], -1
+; GFX9-DPP-NEXT:    s_mov_b64 exec, -1
----------------
perlfu wrote:

The GFX9 constant bus/literal limit prevents us merging all 4.
On GFX10+ they are all merged.
In this case we could use `v_cndmask_b32 v4, v0, v4, s[0:1]` for the last three as destination is one of the sources.

https://github.com/llvm/llvm-project/pull/98864


More information about the llvm-commits mailing list