[llvm] [AMDGPU] Lazily emit waitcnts on function entry (PR #73122)
Jay Foad via llvm-commits
llvm-commits at lists.llvm.org
Wed Nov 22 06:06:38 PST 2023
================
@@ -33,14 +32,21 @@ define <4 x i16> @vec_8xi16_extract_4xi16(ptr addrspace(1) %p0, ptr addrspace(1)
; SI-NEXT: v_lshlrev_b32_e32 v3, 16, v4
; SI-NEXT: v_or_b32_e32 v2, v6, v2
; SI-NEXT: v_or_b32_e32 v3, v5, v3
-; SI-NEXT: s_mov_b64 vcc, exec
-; SI-NEXT: s_cbranch_execz .LBB0_3
+; SI-NEXT: s_mov_b64 s[4:5], 0
+; SI-NEXT: s_andn2_b64 vcc, exec, s[4:5]
+; SI-NEXT: s_waitcnt lgkmcnt(0)
+; SI-NEXT: s_mov_b64 vcc, vcc
----------------
jayfoad wrote:
This is very poor code. Maybe the placement of the waitcnt interfered with some peephole optimization that was suppose to clean it up?
https://github.com/llvm/llvm-project/pull/73122
More information about the llvm-commits
mailing list