[llvm] [AMDGPU] Lazily emit waitcnts on function entry (PR #73122)

Jay Foad via llvm-commits llvm-commits at lists.llvm.org
Thu Nov 23 08:21:53 PST 2023


================
@@ -33,14 +32,21 @@ define <4 x i16> @vec_8xi16_extract_4xi16(ptr addrspace(1) %p0, ptr addrspace(1)
 ; SI-NEXT:    v_lshlrev_b32_e32 v3, 16, v4
 ; SI-NEXT:    v_or_b32_e32 v2, v6, v2
 ; SI-NEXT:    v_or_b32_e32 v3, v5, v3
-; SI-NEXT:    s_mov_b64 vcc, exec
-; SI-NEXT:    s_cbranch_execz .LBB0_3
+; SI-NEXT:    s_mov_b64 s[4:5], 0
+; SI-NEXT:    s_andn2_b64 vcc, exec, s[4:5]
+; SI-NEXT:    s_waitcnt lgkmcnt(0)
+; SI-NEXT:    s_mov_b64 vcc, vcc
----------------
jayfoad wrote:

Right, `SIInsertWaitcnts::insertWaitcntInBlock` generates `s_waitcnt lgkmcnt(0)` and `s_mov_b64 vcc, vcc` to work around a bug on GFX6 where vccz could be clobbered by completion of an SMEM load. Then `SIPreEmitPeephole::optimizeVccBranch` fails to look through the mov.

This only affects GFX6 so I don't think it's worth tring to fix it.

https://github.com/llvm/llvm-project/pull/73122


More information about the llvm-commits mailing list